Psychometrika 


VOLUME XVII—1952 
JANUARY-DECEMBER 





Editorial Council 


Chairman:—L. L. THURSTONE Managing Editor:— 
Dorotuy C. ADKINS 


Editors:—M. W. RicHarDson Assistant Managing Editor:— 
Pau. Horst SAMUEL B. LYERLY 


Editorial Board 


R. L. ANDERSON HAROLD GULLIKSEN GEORGE E. NICHOLSON 
J. B. CARROLL CHARLES M. Harsu M. W. RicHARDSON 


H. 8S. Conrap Pau Horst P. J. Ruton 

L. J. CRoNBACH Auston 8. HousEHOLDER Wu. STEPHENSON 
KE. E. CurEeton LyLE V. JONES GoDFREY THOMSON 
ALLEN EDWARDS TruMAN L. KELLEY R. L. THORNDIKE 
Max D. ENcGetHAartT AvbBert K. Kurtz L. L. THuRSTONE 
Henry E. GARRETT IrvinG LorGE LEDYARD TUCKER 
J. P. GuILForD Quinn McNeEMAR S. S. WiLKs 

FREDERICK MOSTELLER 





PUBLISHED QUARTERLY 


By THE PSYCHOMETRIC SOCIETY 
AT 1407 SHERWOOD AVENUE 
RICHMOND 5, VIRGINIA 











% 
] 


Psychometrika 





CONTENTS 


THE SELECTIVE EFFICIENCY OF A TEST BATTERY - 1 
HERBERT S. SICHEL 


NOTES ON THE COMPUTATION OF BISERIAL CORRE- 
LATIONS IN ITEM ANALYSIS - - - - - - - 41 
LAURENCE SIEGEL and EDWARD E. CURETON 


FACTOR ANALYSIS OF THE ARMY AIR FORCES SHEP- 
PARD FIELD BATTERY OF EXPERIMENTAL 
APTITUDE TESTS - - - - - - - - - - - 46 

J. P. GUILFORD, BENJAMIN FRUCHTER, and 
WAYNE S. ZIMMERMAN 


THE EFFECT OF DIFFICULTY AND CHANCE SUCCESS 
ON ITEM-TEST CORRELATIONS AND TEST 
RELIABILITY - - - - - - - - - = - - - 69 
LYNNETTE B. PLUMLEE ° 


A FACTOR ANALYSIS OF WOMEN’S MEASUREMENTS 
TAKEN FOR GARMENT AND PATTERN 
CONSTRUCTION - - - - - - - - = = = = 87 

HELEN HEATH 


A TECHNIQUE FOR CRITERION-KEYING AND SELECT- 
ING TEST ITEMS - - - - - - - - - - - 101 
JOHN FRENCH 


A FACTORIAL STUDY OF TEMPERAMENT - - - - - 107 


OHN-W,-FRENCH— 
“ (} AL bs 








VOLUME SEVENTEEN MARCH 1952 NUMBER ONE 





—Soetal Science 


weitbrary 

B F BIOSTATISTICS CONFERENCE 
' June 16-July 23, 1952 

p? z Iowa State College, Ames, Iowa 





A biostatistics conference has been scheduled for the first session 
of the 1952 Summer Quarter at Ames, lowa, sponsored by faculty 
members working in agriculture, biology, and statistics at lowa State 
College and by The Biometric Society (ENAR). The five-week con- 
ference is arranged so that persons who cannot attend the entire con- 
ference can advantageously come for one or more weeks. Iowa State 
College is giving the Conference financial support. 

The plan is that each morning a biologist will present a prob- 
lem, outline the objectives, and describe techniques suitable for the 
experiment and analysis. A paired statistician will discuss suitable 
experimental designs and statistical and mathematical methods for 
attacking the problem. These speakers will preside at a general dis- 
cussion period of the same topic the same afternoon. 

The program is tentatively arranged in five somewhat separate 
weekly units as follows: 


First week: Development of Quantitative Biology 

Second week: Specification of Populations and Their Processes 
Third week: The Estimation of Populations 

Fourth week: Individual Growth 


Fifth week: Biomathematical Mechanisms Within the Individual and 
Species 


It is expected that the Conference will be of interest to advanced 
undergraduates, graduate students, and research workers in the vari- 
ous biological sciences and to statisticians who are interested in sta- 
tistics as a research tool. Some graduate credit in Statistics at Iowa 
State College will be allowed for attendance and study during the 
Conference. 

Rooms will be available in the college dormitories at the usual 
rates. For more detailed information write: T. A. Bancroft, Director, 
Statistical Laboratory, Iowa State College, Ames, Iowa. 

















PSYCHOMETRIKA—VOL. 17, NO. 1 
MARCH, 1952 


THE SELECTIVE EFFICIENCY OF A TEST BATTERY* 
HERBERT S. SICHEL 
NATIONAL INSTITUTE FOR PERSONNEL RESEARCH, SOUTH AFRICAN COUNCIL 
FOR SCIENTIFIC AND INDUSTRIAL RESEARCH 


In industrial acceptance sampling one frequently makes use of 
operating characteristic curves to describe the discriminating power 
of a particular sampling plan. Similarly, it is possible to demon- 
strate the selective efficiency of a test battery in terms of (a) the 
Applicant’s Operating Characteristic (A.O.C.); (b) the Selector’s 
Operating Characteristic (S.0.C). The A.O.C. determines the chance 
of selection by means of a test for any given level of true ability. 
The S.0.C. connects functionally probability of success on the cri- 
terion with the predictor scores of a battery. For the case of a nor- 
mal bivariate distribution the exact mathematical expressions of the 
OC curves are derived in terms of the correlation coefficient p, the 
cut-off points a and #, and the predictor and criterion scores X and 
Y (in standard measures). The Efficiency Index H is defined as 
the percentage of successful subjects gained by the use of a test 
battery, taking chance selection as a yardstick for comparison. Its 
optimum, for fixed p and a, is derived. The distribution law of the 
criterion scores of selectees is deduced and its first four moments 
are shown to depart little from normality for cases usually encoun- 
tered in practice. A “Quality-Gain” diagram graphically illustrates 
the improvements secured. Another simple device, the “Cost-Utility” 
diagram, explains to management the full implications of selecting 
personnel by means of a test battery. Neither of the diagrams re- 
quires an understanding of the correlation coefficient. The confidence 
belt of the OC curves, the standard error of the mean criterion score 
of selectees and the standard error of the predicted number of suc- 
cessful applicants are determined. Finally, the full theory is applied 
in detail to a real test battery. 


In a large scale selection program we may distinguish three par- 
ties all having somewhat different approaches and interests at heart. 
They are: 


(a) The men seeking employment, entrance to a college, scholarships, 
induction into an army, etc. We shall call them the applicants. 
(b) The selection agency, which may be a personnel department, a 
draft board, a selection committee, or a psychological unit here- 
after called the selector. 
(c) The agency for which selection is to be carried out such as man- 
agement, the army, a university, in short, the employer. 
*This paper is the result of an investigation undertaken by the South African 
National Institute for Personnel Research, Johannesburg, and was completed at 


the Educational Testing Service, Princeton. The author wishes to thank these 
organizations for permission to publish this paper. 


1 








2 PSYCHOMETRIKA 





In the following, an attempt has been made to describe quantitatively 
what risks the various parties run in going through or having in- 
stalled a selection program. The principal instruments of measure- 
ment are the applicant’s and selector’s operating characteristics (OC 
curves), the efficiency index of a selection procedure, the quality-gain 
diagram, and the cost-utility diagram. 


I. The Mathematical Model 


(a) The Applicant’s Operating Characterictic (A.O.C. Curve) 

Whether it is a job in an industrial enterprise or admission to a 
pilot training course, the individual applicant’s main interest lies in 
being picked to the vacancy for which he is competing with other can- 
didates. When turned down, he may question the competence of the 
selector, feeling that his true potential was not recognized. Especial- 
ly aptitude-test selection has come in for a lot of fire. It is said that 
this procedure is wholly undemocratic as it takes no account of the in- 
dividual. It is supposed to work only in the employer’s interest as it 
creams off the best test scorers but does not give any consideration 
to the individual who may have all abilities requisite for the job but 
is unlucky to be below the cut-off point on the battery. 























x | 

» 

: 

y 

iS 

£ 

x 

% 

% 

q 
B 
aa Se 

True Ability Measure “y” 


4IG./ BIVARIATE DISTRIBUTION 





Let us assume that predictor scores and true criterion ability 
measures are normally correlated as shown in Figure 1. For appli- 


cants whose true ability is 
L<y<h, 


the probability of scoring equal to or above the predictor cut-off point 

















HERBERT S. SICHEL 3 


a is clearly 


ls +R 


SJ f (x,y) dy dx 


prob.(z > a) = : (1) 


le tw 


J J tenaeae 


ly -@ 





where 





(a-2pzy+iP) 





1 Pe 
f(2y) =—————e 
2aV/1 — p? 


and both variables expressed as standard measures. Equation (1) 
may be simplified by substitution of 








—_ *— 
4= —_—_— 
vi--* 
It becomes 
Py v +0 w 
e | fe 2 du | dy 
43 | a-py 
1 vi-p* 
prob. (% 2 a) = a , (2) 
Via ae 
Je 2 dy 


cB 


Suppose now that the interval /, — 1, is small but finite. Then, in 
view of the mean value theorem, 








roo u2 
prob.(z 2a) = a e7t(o:?-027) f e* du, (3) 
V2n pail 
: vi-p? 
where 
L, < 6, < L ’ 
L,<6.<kh. 


Keeping /, constant and letting /, approach I, we have 








4 PSYCHOMETRIKA © 





6, ly, 
2 pig l, ’ 
and in the limit (3) becomes 
| 1 = 
lim. prob. (a > a) = —— e ? du. (4) 
if - fon 


Replacing /, in equation (4) by y, we find for the conditional prob- 
ability 
a— py 


‘prob. (#2 aly) =a(y) = = ’ (5) 
yi 





where 


+0 


&(x) = dae J eo? du, the standard normal integral. 
V2a 


Equation (5) is called the Applicant’s Operating Characteristic giv- 
ing an individual’s chance of selection to the vacancy* as a function 
of his true ability measure. 

Equation (5) could have been derived in a simpler fashion. How- 
ever the limit approach is to be preferred as it produces formula (2) 
as a valuable by-product, i.e., the chance of selection of a group of 
applicants all of whom fall within an ability interval. 

If p = 0, that.is if we deal with a pure chance selection, (5) de- 
generates into 


n(y) = (a) =constant . 


The A.O.C. curves for a = 1.0, p= .6 and p= 0 have been drawn 
in Figure 2. On the abscissa standard scores (¢ = 50, o = 10) were 
plotted instead of standard measures. (“Standard Score” is symbol- 
ized in the diagrams by “S/S.”) The graph clearly shows that an in- 
dividual’s probability of being selected is steadily increasing with in- 
creasing true ability. The higher a man’s true potentialities to cope 


*This follows directly from the fact that an applicant will be selected if his 
score x > a. a, of course, is dependent on the selection ratio. 














1) 


HERBERT S. SICHEL 

















a 
iS 
4-00 4 
‘6 
§ pe 
UO} 8 4 2:0 (6055) 
3 
SI -60 4 
% 
dy *40 + 
._ , 
8] -20 ee Selection a 
LC aoa 
107-5 aS 675 47-5 27s 7$ 


Criterion Scores ‘y’ 





FIG. 2 APPLICANTS OPERATING CHARACTERISTIC 





with the job the better are the odds of his being chosen. On the other 
hand, chance selection does not take into account true abilities. As 
may be seen from Figure 2, the chance of selection is the same for the 
dull and the gifted. 

It must not be inferred from the above that all non-test proce- 
dures of selection produce pure chance A.O.C. curves. However ex- 
perimental evidence goes to show that non-test techniques give rise 
to very shallow A.O.C. curves. 

For individuals of low true ability the chances of getting the job 
through test selection are minimal. Such candidates’ rejection is in 
their own interest as it will save them disappointments and frustra- 
tions at a later stage when they turn out to be occupational misfits. We 
may, therefore, conclude that far from being undemocratic it is in 
the individual’s own interest to be selected or rejected by means of 
a test battery. 


(b) The Selector’s Operating Characteristic (S.0.C. Curve) 


After the construction of a reliable and valid test battery, the 
selector’s main interest is focussed on the correctness of his selec- 
tions. As correlations near unity are non-existent in the psychologi- 
cal prediction field, it will happen quite often that those recommended 
to the vacancies will not make the grade on the true ability variable. 
What we need is the assessment of risk we run in selecting an appli- 
cant with a given test or battery score x. The individual applicant’s 
chance of success on the true ability scale is the probability of equal- 








6 PSYCHOMETRIKA 


ling or exceeding a lower bound f of satisfactory ability. 6 may be 
fixed by the employer at a standard known to him from previous ex- 
perience ; it may be defined as the mean criterion score of an untrun- 
cated sample; or it may arbitrarily be chosen by the selector himself. 

The Selector’s Operating Characteristic may be derived in pre- 
cisely the same manner as the Applicant’s Operating Characteristic. 
Reversal of the variables in equation (5) and substitution of cut-off 
point f on the true ability scale for a gives 


B—pz 
n(2) =0( —**) , (6) 
V1i—?’ 


which expresses the chance of success on the criterion variable as a 
function of test scores. In Figure 3 the S.0.C. curve has been drawn 





eee 


B=-'5 (45S5) 


1:00 + 





*80 4 








Probability of Success (x) 





a] a 7 in 7, t 
S25 62-5 42-5 22-5 2§ 


Gatltery Scores ‘x’ 


FIG.3 SELECTORS OPERATING CHARACTERISTIC 








for 6 = —0.5 and p = .6. As may be seen, the higher an individual’s 
test score, the greater is his probability of success. The magnitude of 
the slope of the central portion of the S.O.C. curve gives an idea of 
the discriminating power of the test or battery. For p = 1 the curve 
degenerates into a vertical line at test score x = £. In that case we 
have perfect discrimination provided a = f. If the cut-off point on 
the true ability scale (or its estimate, the criterion score) is changed 


or 


ders 





to §’, we shift the S.0.C. curve by without altering its shape. 


aT 
The S.0.C. curve based on equation (6) supplies the mathemati- 
cal model for expectancy charts. It is useful for prediction per se, for 














HERBERT S. SICHEL 7 


graduating observed expectancies, and for testing statistically wheth- 
er observed proportions of successes conform with or deviate from 
the hypotheses underlying expectancy charts. Equation (6) is not 
new. It has been used by McClelland (1), tables of it were computed 
by Bittner and Wilder (2), and the whole subject of expectancy charts 
has been well discussed by Bingham (3) recently. 

In practice, observed proportions of successes are almost invari- 
ably based on grouped data. If the total range of test scores has been 
subdivided into ten or more intervals no serious error will be com- 
mitted by substituting the midpoints or the exact mean values of the 
class intervals into equation (6) for making comparisons with the 
theoretical values. However, current expectancy charts are frequent- 
ly based on five or fewer intervals. In that case it is safer to use the 
exact formula which we may write down from equation (2) by in- 
terchanging the variables and replacing a by f. It is 


be Rt +00 ye 
2 2 
f e J e du | ie 
I; | B-pz 
1 vi-P* 
prob.(y 2 £) =—— : 
V2a sere 











(7) 


ly 


Numerical values of (7) may be found either from Pearson’s Tables 
for Statisticians and Biometricians (Vol. II) or by a mechanical quad- 
rature. 


(c) The Efficiency of a Test Battery 


(1) The Efficiency Index:The efficiency of a battery is the 
employer’s chief concern. It is he who pays for the selection program 
and naturally, he is right to ask: Does it really pay? 

Various devices have been proposed to answer his question. Tay- 
lor and Russell (4) suggested a comparison of the proportion of suc- 
cessful selectees with those obtained from batteries with zero and 
perfect validities. Brogden (5) introduced a measure of gain being 
the absolute difference of the criterion means of a selected and unse- 
lected population of applicants. A similar method, only on a per- 
centage basis, had previously been described by Jarrett (6). Brog- 
den (5) also considers the gain in income from improved production 
when the costs of large scale testing are taken into account. McClel- 











8 PSYCHOMETRIKA 


land (1) uses the proportion of “misfits,” this being the proportion 
of incorrectly accepted and incorrectly rejected applicants as an in- 
dex of a battery’s efficiency. He proves that for a given # the pro- 
portion of misfits is a minimum if 


' p—-pa 1 
ee 
( vVi-~* 2 
and determines the predictor cut-off point a accordingly. 

All the above measures serve useful purposes. However, in per- 
sonnel selection the cost of testing is not always of importance, i.e., 
for scholarships or college entrance, and the concept of output gain 
needs considerable stretching when applied to pilot selection. The 
minimum number of misfits is an excellent yardstick in classification 
problems but becomes of dubious value in a selection program. Ap- 
parently the most direct measure of efficiency and the one which 
makes sense in the majority of practical cases is the one originally 
proposed by Taylor and Russell. It is developed further here. 

A good device of efficiency would be the percentage gain in num- 
ber of successful battery selected applicants over existing routine 
selection practice. Very rarely do we have data to measure the suc- 
cess (or otherwise) of non-test procedures. In an experimental setup 
it should be possible to have an untruncated sample followed up on 
which test and non-test selection has been carried out prior to induc- 
tion into the work process. In such an experiment it should be possible 
to construct operating characteristics for both selection procedures. 

Besides the lack of data, there is a further point to militate 
against the use of non-test procedures in gauging the efficacy of test 
selection. Whereas for a given applicant population and a stable cri- 
terion cut-off point f, battery selection will produce comparable re- 
sults from place to place, this should not prove the case in a non-test 
procedure in which the success of selection depends entirely on the 
skill and human insight of the person hiring labour. For a scientific 
measure of efficiency we need a fixed bench mark with which we can 
compare the results of aptitude testing. It is for this reason that we 
have to fall back on pure chance selection, and we do not imply that 
non-test procedures do by necessity approach a lottery. 

We shall define the “efficiency index” of a battery as the per- 
centage gain of successful test selected applicants over chance se- 
lected candidates. Expressed in other words, it is the number of suc- 
cessful candidates gained over chance selection for each hundred ap- 
plicants selected. We write 














HERBERT S. SICHEL 9 


Efficiency Index H = 100(a,.—2,)%, o (8) 
where 
ond. ff flay) dy de (8a) 
7) ~ 6(a) e oY YY 
B a 
and 
m —(£).. (8b) 


f(x,y) is again the normal bivariate function. Equation (8a) may be 
evaluated with Pearson’s Tables or with a mechanical quadrature 
formula. 


The predictor cutting score a will be dependent on the selection 
ratio.and, as Taylor and Russell have previously shown, the lower it 
is (and consequently the larger a) the greater will be the efficiency 
index H. 


The criterion cutting score § is usually fixed by the employer. 
However, where no such standard is known a priori, it is possible to 


locate 6 in such a way that H becomes a maximum. We have for maxi- 
ma or minima 





Equation (8) may be written 


+00 


y? 
100 -— a—py 
ae | e * #1 —x— | &— 1000(£) ; 
V2n ®(a) V1—/ 


B 





+00 


=! [ ert(y)a(y)dy—100.0(8), (9) 


~~ 6(a) 
B 
where 
y? 


1 
erf(y) =——-e *. 
2n 


Hence 











10 PSYCHOMETRIKA 








oH = 
oF =100/ eo erf (8)2() - ert () | 0, 
a(B) =o( =) = (a), 
V1—p’ 
PF 
—_————_ =a 
V1—?’ 
and finally for maximum condition 
1— V1—, 
fae ( | a. (10) 
p 


The formal proof that we really deal with a maximum has been omit- 
ted. In Figure 4 equation (9) has been graphed for a = 1.0 and 














a= +70 (605s) 
~ 
40 + 
z= 
30 4 
y 
® 
Ol 204 
~ 
S 
wy 
x 
40 4 
oO t ‘ t ' , t LJ 
20 30 40 50 60 70 80 


Criterion Cutting Score B 





4IG.G GATTERY EFFICIENCY H AS A FUNCTION OF 
THE CRITERION CUTTING SCORE B 








p = .6. It brings out the important point that the efficiency of a bat- 
tery is not only conditioned by the selection ratio and correlation co- 
efficient but is also very much dependent on the criterion cutting 














HERBERT S. SICHEL 11 


score §. For a given correlation and selection ratio the efficiency de- 
creases rapidly on either side of the optimal cut-off point Bna: . 

(2) The Quality-Gain Diagram: A comparison of the criterion 
distribution of the selected group of applicants (shown as the dotted 
curve in Figure 1) with that of an unselected group is of importance 
as it may be used to combine the usefulness of the efficiency index H 
with a graphical description of the over-all improvement in quality 
performance of battery selected personnel. The true ability distribu- 
tion of the unselected group is 

erf(y) dy, 
and the probability of being selected through a test is 


ny) =0( 224) 


for a given level of true ability y. The total number of persons se- 
lected is N®(a). Hence the required probability distribution of cri- 
terion scores (being the estimate of true ability) is 





#(y) dy =—— erf (y)(y) dy. (11) 
(a) 

Jarrett (6) has previously shown that the standard deviation 
of the criterion distribution of the selectees differs but little from the 
unrestricted criterion distribution. What is even more astonishing is 
the fact that the criterion distribution is staying nearly normal in 
spite of heavy truncation on the predictor variable. We may prove 
this phenomenon by deriving the first four moments of (11). 

The mean criterion score of applicants having all predictor score 
x 1S Y=p2z, 


and the mean criterion score of all selectees 


a 


M,’ x erf(x)dx—=pm,', (12) 


tit etic 

&(a) 
where m,’ is the mean of the truncated tail about the origin. The 
first four moments of individual array distributions about array 
means are 


a=—0, 
M2 =1—p’, 
pr (13) 









12 PSYCHOMETRIKA 


We now take the moments of individual array distributions about 
M,’. Central moments may be transferred to other axes by the well 


known formula 
n(n—1) 





Yn = Un + Nd ptn-1 + 21 @? Un-2 = ae ae 
where d is the distance between the centroid and the new axis. In 
our case 
d=y—M,'=p(x—m/)), 
hence 
Vp —— sl, + Niun-rp (x ei m,’) 
n(n—1) _ 
ig 2! n-2p? (% — my’)? +--+, 
Summing the arrays x > a, we find 
M ay ee en, 
My = My n-1 ~ ow’, 
(a) J [ ; aie 
n(n—1 
OD yi] 
2! 
' n(n—1) 
X erf (x) dx = uy, + a Un-2p"Mz 
n(n—1 —2 
rs (n—1) (n—2) acing +0, (15) 





3! 


where m, is the nth central moment of the predictor scores of the se- 
lected group. Substitution of (13) into (15) gives 








M,.=1—/’ = p>M: , 
M;,>= p’Ms , (16) 
M,=38(1— p?)? + 6(1 — pp?) pm. + p*m, 
and finally for skewness and kurtosis of the ¢(y) distribution, 
p® cP 
ae ) , (17) 
+ p? 
Me 
p* (cB2 — 3) 
(18) 























HERBERT S. SICHEL 13 


where subscripts x refer to the truncated predictor distribution, sub- 
scripts y to the resulting criterion distribution of selectees and 

















_ M;? __ ™;* 
wh = F738? bi =e? 
ere M, _ ™s 
wha Fy” a" 

For the special case of a= 0, 

a—2 

Mz = 
1 

2(4—a2)? 
aS (19) 
Oa 2) 
r _ 3a*— 4a— 12 
zP2 (n— 2)? : 








Equations (16), (17), and (18) express the parameters of the 
criterion distribution of selectees as functions of similar parameters 
of the truncated predictor distribution. We may, therefore, compute 
numerical values by making use of Pearson’s Tables of the moments 
of a truncated normal distribution. Table 1 gives some numerical 
values for p = .6 illustrating clearly that in spite of low selection 
ratios the criterion distribution is never far away from normality, i.e., 

















a 
Successes failures 
$0 4 
Chance Selected ———— 
3 allt battery » ---- 
$ 
i = 6 
Reo B=--5 (655s) 
Ws% a= +70 (60S/s) 
0 | 
o + + T T T 
17-5 ws 675 675 27s 75 


Criterion Scores “y’ 





FIG.5 QUALITY GAIN DIAGRAM 











14 PSYCHOMETRIKA 


6, = 0 and §. = 3. It also shows that the selected group does not be- 
come very much more homogeneous than an unselected group would 
be because its standard deviation of true ability shrinks only a little 
even for strong curtailment on the predictor variable. 

Such deviations from normality as exist could be detected neither 
by a x? test nor by a significance test for the betas if our selected 
group is N = 1000. Only for correlation coefficients far in excess of 
the current ones would the criterion distribution of the selectees be- 
come markedly skew and peaked. 

The Quality-Gain Diagram is obtained by plotting the probability 
distributions of true ability for the unselected and selected group. 
The shift of distributions against each other depicts graphically how 
much is gained by selecting a given number of applicants by aptitude 
testing in comparison to chance selection. The areas lopped off by the 
vertical axis through the cutting score § on the criterion variable give 
the wastage rates of the selectees for either method of selection. The 
difference of these areas is the efficiency index H. A Quality-Gain 
Diagram has been drawn for p= .6, a = 1.0, 8 = —0.5 in Figure 5. 
The scale unit used is again the normalized standard score, S/S. 


TABLE 1 


Standard Standard 
Deviation Deviation Skewness Kurtosis Meanof 
Predictor Selection of Predic- of Crite- of Crite- of Crite- Criterion 














Cut-Off Ratio tor Scores rion Scores rion Scores rion Scores Scores 

a © (a) o, oy yB, yPo M,’ 
—o 1 a 1 0 3 0 

0 +5000 .6028 .8780 .00484 3.2503 4787 

+1 .1586 4462 .84386 .00177 8.2026 9151 

+2 .0228 3380 .8253 .00052 8.1095 1.4239 
+00 0 0 8 0 3 oo] 

p= 


(3) The Cost-Utility Diagram: Although the proportion of suc- 
cessful selectees is the most important figure in any selection proce- 
dure, the employer undoubtedly will also be interested in the num- 
ber of rejected applicants who had the required abilities for the job. 
In fact from the general manpower point of view selection of the 
capable and rejection of the incapable candidates is only achieved at 
the cost of accepting some applicants who do not make the grade and 
turning down quite a few others who would have proved successful 
if selected. 











HERBERT S. SICHEL 15 


Berkson (7) has introduced the concepts of ‘‘cost” and “utility” 
of a test. To the author it seems almost impossible to give a general 
formula for balancing utility against cost, as circumstances differ so 
much from one selection program to another. For example, we may 
not mind rejecting many capable applicants for the vacancy of a 
train driver as long as the one we select is a good one. On the other 
hand, the number of rejected capable candidates for pilot training 
would be of major consequence in a national emergency. We may, 
however, show in an unambiguous manner how battery selection di- 
vides the applicant group into four distinct classes once a, f and p 
are determined. 

By graphing the true ability distribution of the complete popula- 
tion of applicants and the distribution of the selected candidates, we 
arrive at the “Cost-Utility Diagram,” this name being borrowed from 
Berkson. It is to be noted that for this particular purpose the cri- 
terion distribution of selectees is to be taken as 


p(y) dy = (a) $(y) dy = erf (y)a(y) dy, (20) 


making the ratio of the areas under the two curves equal to the se- 
lection ratio. The distributions are shifted against each other similar 
to the displacement in the “Quality-Gain Diagram.” The Vertical axis 
through the cutting score f,on the true ability scale divides the area 
















A 
Successes | Failures 
4 = Selected and Capable 
<1 Rejecled »  » 
x Selected » sncepable 
‘Q y ” ” 
3| 2 ERY Peyected 
g woe 
- B= --5 (4596) 
a= 0 (60 5/5) 
Qo ‘ ‘ , 
107-5 OS 675 47$ 27-5 75 


Criterion Scores ‘y! 





FIG.6 COST-UTILITY DIAGRAM 











16 PSYCHOMETRIKA 


under. the total applicant. distribution into four segments being pro- 
portional to 


(1) the number of capable applicants selected by battery, 
(2) the number of capable applicants rejected by battery, 
(3) the number of incapable applicants selected by battery, and 
(4) the number of incapable applicants rejected by battery. 
In Figure 6 a Cost-Utility Diagram is shown for a= 1.0, = —0.5, 
p= .0. 

It is hoped that the “Cost-Utility Diagram” will prove of value 
to the selector in putting the usefulness of a test battery across to 
management. Although it complies with rigorous scientific criteria, 
it does not require the understanding of a correlation coefficient or 
the explanation of probability. In spite of its simplicity it gives all 
the vital information in which the employer is interested. 


Il. The Statistical Estimation 
(a) The Confidence Beit of the Operating Characteristics 


The mathematical models of operating characteristics can be 
tested against actual data only if we are in a position to calculate the 
standard errors of their estimates. The two operating characteristics 
described in the previous section are analytically identical. We shall 
derive the equation for the probability limits of the S.0.C. curve. The 
whole argument is also applicable to the A.O.C. curve provided we 
interchange the variables and replace § by a. 

Formulas derived previously were all given in standard meas- 
ures. In practice we usually deal with raw test and criterion scores. 
Consequently we have to rewrite the basic equation (6) as 




















1 L—y a—é 
sili es |(=)--( ) (an) 
VvVi-—-f¢ Oy Cz 4 
where 

a—€ 

=X, (21a) 

sae. (21b) 
Oy 


Equation (21a) amounts to a linear parametric transformation. 
Hence 


a(X) =a(z). 























HERBERT S. SICHEL 17 


We estimate a(x) by 


romebeeal(’S) AZ). & 











We require the variance of p(x). For large samples we have for the 
variance of a function involving several statistical estimates. 


var (¢) =={( st ) var(z) +z ay ahd Zr) , (23) 


where z; is the jth statistic involved. After finding the various par- 
tial derivatives in (22) and replacing the statistics by their expec- 
tations we have from (23) 


} 1 1 BPX \i2fpbB-X\8 
var p(x) gor ert es | var (7) 
(1 — p*) V1i—?’ Saad 


14" p \? B\* 
+(=] vari +(+) var («) so var (8y) 


x \: 28(p B—X 
+ ( 2 ) var (s,) Bs Rad cov (7,8,) (24) 
Cz oy (1 ca p*) 








é 2p X(ph—X) 
oz (1 — p?) 
2p 





cov (7,82) 








xX 
cov (9,4) — cov ons) ; 
Oz Oy Oz Oy 

In a normal bivariate distribution the estimates of the means 
are independent of the estimates of the standard deviations and the 
correlation coefficient. For this reason six covariances vanish in (23). 
Further we have in large sample theory for a normal bivariate dis- 
tribution 


(1 ome bP oy" oz" 
var (7) = —————,, var (y) = —, var (“) =—, 
N N N 


oy" oz” 
var (sy) =——, var (s,) =—; 
2N 2N 





and 








18 PSYCHOMETRIKA 








p oy (1— p*) poz(1— p’) 
cov (r,2,) = —————— , cov (7,8,) = ————_, 
2N 2N 
. Pp Ty Or p” Oy Or 
cov (¥4,¢) =— , COV (S,,82) = 


2N 


[See (8) page 38.] Substitution of these expressions into (24) gives 


1 1 ve BOB — 
7a" — 1+ A? + 
— = 1— 2—* 2—p* 


25 
| p—pxX he ( 
Xx | ar( ) ; 
| vi-¢ /| 


Let us investigate equation (25) a little closer. For large positive or 
negative values of X the variance of the proportion of successes be- 
comes very small; this means that we can make increasingly more 
accurate statements with respect to expectancies in the applicant 
population at the extremes of the predictor scale. 


Differentiation of (25) and equating to zero gives 














B(2+ p*?) 21 p?) — 3 p(B? — p? +1) 
pp) p(2—p*) 
_ be —# +1) _ seid 
p(2— p*) 


Eq. (26) has one or three real roots according to 


q \° p \" 
==} ee —{— > 
o=(3) (3) 80 


2(1 —— @*)*(2 + dp”) 8(1 — p*)?(2 ar 7p”) . (27) 
o 3p (2 — p?)? eS 27 p* (2 — p?)* B, 
(1 p*) (2 3p") 2(1— p*) (2 —5p*) 
~ pt(2—p*) Bp? (2—p*)? 


where 





q— 











p p. 


If 


D > 0 we have one maximum of var p(x), 














HERBERT S. SICHEL 19 


D=0we have one maximum and one point of in- 
flexion, with horizontal tangent, 
D <0 we have two maxima and one minimum. 


The latter case is interesting as we have a constriction between the 
upper and lower probability limits somewhere between the extremes 
of the score range. In Figure 7 this type of probability limit is indi- 
















4°00 > Sa eee i 
Bas hy S.0.C. Curve son 

g] < ea a 
El 80 4 SN Probability Limits ~ Statist. Estin. —-—— 
3 ; ” » ~Lxpectancy Charts 
iy 
3 604 B= 4 
% N= 400 
>» 
& 40 4 
3 
SQ 
§| 2 | 











Gatlery Scores ‘2c’ 





FIG. 7 PROBABILITY LIMITS OF TWO METHOOS OF ESTIMATION 





cated for f= 0, p= .6, N = 400. From the population 8.0.C. curve 
a(x), 2.58{var p(x) }* was laid off on either side. Where values above 
unity or below zero occurred they were taken as unity or zero. This 
must happen at the extremes of the scale as here, p(«) cannot be dis- 
tributed normally about x(x) even for large N . Only once in a hun- 
dred times should an estimated S.0.C. curve based on one particular 
sample lie outside these limits. 

The above argument was based on the knowledge of the popula- 
tion S.0.C. curve. Usually we do not know it. Taking the S.0.C. curve 
based on the particular sample we deal with as the best estimate, we 
can lay off 2.58{var p(«)}' from p(x) and state that in repeated in- 
ferences of similar kind it will happen in the long run only once in a 
hundred times that a population S.0.C. curve a(x) will lie outside 
limits derived in such a way. The area between the limits is the 99% 
confidence belt for estimating true expectancies a(x). 

The present usage of expectancy charts, seen from the viewpoint 
of estimation, is a very inefficient procedure. For each group of appli- 











20 PSYCHOMETRIKA 


cants scoring J, < x < l, on the predictor, an estimate of the popula- 
tion proportion of successes 2, given in equation (7), is furnished by 
computing the proportion of observed successes p; in each class in- 
terval. The variance of such an estimate is 


1 
var Pp; ——a(l—az). 
n 


If the class interval is not too coarse we have 
De = Pe(x) = estimate of n(X) 


and 
var De(2) =—a(X) (1 —2(X)], (28) 


where « is the class midpoint, i.e., = 4(l, +h). 

In Bittner’s and the present paper it is suggested to estimate the 
same 2(X) by p(x) [equation (22)]. The statistical efficiency of the 
old method in comparison to the one advocated here is 


var p(x) 


Eff. pe(x%) = ; 
var p; (x) 


(29) 


If N, the total number of applicants, is large and the class interval 
width W (in standard measures) reasonably small, we may write 
without introducing any serious error in (28) 


n = NW erf(X) 
and equation (29) becomes 
a(X) (1—2(X)] 


WwW 1 B 
BH. pe(2) => (1+ — } erf ( 
2 1 -~»* Vit?” 
(30) 


_ Seh .  P+201— 9’) 2pB— (1+ p*)X\. 
| Xx? —-— X + erf — 
os 2— 2— p’ Vie 


From equation (30) the statistical efficiencies were calculated 
for some values of X for § = 0 and p= .6. From equations (28) and 
(25) the standard deviations of the estimates for a(X) multiplied 
times t = 2.58 were also computed. N was taken as 400. All results 
are given in Table 2. 


, 1 


























HERBERT S. SICHEL 21 











TABLE 2 
Standard Efficiency 2.58 o[py (x) ] 2.58 o[p(x)] 
Measure of pp (x) (28==0,e=.6,. (2-0, 9=—-6, 7 (x) 
x (8 =0,p=.6) N = 400) N = 400) (8 =0,p=.6) 
0 127 145 051 -500 
5 .140 148 054 646 
1.0 143 155 059 173 
1.5 -100 -170 054 870 
2.0 044 193 041 .933 
2.5 018 .231 027 970 
3.0 .002 .289 012 .988 


In the above table only positive deviations of X are given. As 
6 = 0, due to symmetry, identical numerical values are obtained for 
negative deviations. The statistical efficiencies for the estimator pz (x) 
are very low in comparison to estimator p(x). For example, to arrive 
at the same accuracy of estimate for the population proportion of 
successes in the test score interval — .25 < X < + .25, the total num- 
ber of applicants N has to be eight times greater using the current 
method of expectancy chart inference than when using equation (22). 

The reason for the relative inefficiency of the current method of 
estimation is the small amount of information furnished by the num- 
ber of applicants in one particular score interval only. On the other 
hand equation (22) makes effective use of all the information con- 
tained in the total sample N even when estimating the proportion of 
successes for one specific score interval. 

The respective probability limits and the S.O.C. curve for 6=0, 
p = .6 and N = 400 are drawn in Figure 7. The graph illustrates 
strikingly how much may be gained by using a statistically efficient 
method of estimation. 


(6) The Standard Error of the Estimated Mean Criterion Score of 
Selectees 


The standard error of the estimated mean criterion score of se- 
lectees has been discussed in Jarrett’s paper (6). The formula given 
is, as the author points out, a rough approximation as it does not take 
into account sampling fluctuations of the correlation coefficient and 
the standard deviations. Furthermore, it refers to an estimate based 
on Na,, the number of selectees, only. Consequently a great deal of 
information is sacrificed as, by such a method of estimation, we do 
not make use of the data furnished by N(1 — 2,) applicants. In the 











22 PSYCHOMETRIKA 


following, the large sample variance of an estimate based on the 


total number of appiicants is derived. 

It has been mentioned previously that the cutting score a on 
the predictor variable is dependent on the selection ratio a,. Theo- 
retically we ought to find a from the parametric relation 


a; = (a), 
and the raw predictor cutting score from 
A=ao,+é. 


The practical approach, however, is slightly different. From the mean 
and standard deviation of the first sample we estimate 


4 =a 8, Bas 


Having thus determined 7’ we keep it fixed in future applications of 
the test or battery. 4’, which originally was an estimate of 4, becomes 
a parameter of its own, but 


i] 
- } 


Hence in practice the selection ratio obtained will be different from 
a, when selection takes place at a fixed cut-off 4’. The difference will 
not be very pronounced if the total number of applicants N on which 
#, and s, are based is large. The raw score criterion mean of the 


selectees in the population of applicants is 


i$ 
erf(“—*) 
Or 


re poy tN, (31) 


® ( a. ) 
Or 
i —& 
erf ( ) 
Bs 


a "8, + Y. (32) 
pe 
(=) 
Sz 


The variance of g may be derived in the same way as var p(x) using 
equation (23) and the same variances and covariances for the indi- 


vidual five statistics involved. 





and is estimated by 























HERBERT S. SICHEL 23 


We finally have for the variance of the estimate of the mean cri- 
terion score of selectees 


var (9) * 1 +(4 — ©) ma’ + “( 1+ =) ( my —a') my,” 


a seth th’ , , rae oy” ae 
—p'| 2+ am, m,’—a'} m, | WN’ (33) 


where m,’ and a’ are the mean and cut-off point of the truncated nor- 
mal distribution expressed in standard measures. For the special case 
of a = 0 and p= .6, the variance of g is only 6/10 of the variance 
given by Jarrett in spite of having taken account of the sampling 
fluctuations of the five statistics involved. The reason for this greater 
efficiency is, as already mentioned, the larger number of applicants 
contributing to the estimation process. 


(c) The Standard Error of the Predicted Number of Successful Se- 
lectees at Score Level x 


After testing it is often imperative to make some kind of a fore- 
cast with respect to the selected applicants which may be verified 
from subsequent validations. In the case of an individual we cannot 
say more than that his chance of success is 2(x). We cannot check 
his a priori chance from a validation study. However, we may check 
the a priori chance of the total group of selectees by comparing the 
actual proportion of successes with 2, of equation (8a). 

The total group of selectees is in itself rather variable with re- 
spect to test scores and in many applied situations the raw predictor 
cutting score i’ shifts in time due to the needs of the employer and 
the abundance or shortage of labour. It is, therefore, more advantage- 
ous to compare the probabilities of groups of selectees falling within 
a predictor score range 1, < x < lL, with proportions of successes at 
the follow-up stage. In fact, such a procedure gives real meaning to 
the probability of success of an individual as it implies that of E, 
persons, all attaining the same battery score x as he does, E,2(x), in 
the long run, will prove successful on the job. The standard error of 
the predicted number of successful selectees is 


o{ E.px(x)} = VEm(x) Ti —a(x)). (34) 
For a group of EH, chosen applicants all having test scores 1, < «x <L 
we expect with confidence of P = .99, 








E,n(x) + 2.58 VEn(x) [1 —a(x) J (35) 











24 PSYCHOMETRIKA 


to prove successful in the occupation for which they were selected. 
Eq. (35) is an approximation since for small E, the binomial dis- 
tribution deviates from the normal. 2(x) has to be estimated from 
p(x) of equation (22). For small and moderate E, the standard er- 
ror of p;(x) is much larger than that of p(x). Even for compara- 
tively large numbers of applicants N , the number per array E, is 
moderate or smallish. Hence the substitution of 


p(x) =a(x) 


in (35) does not cause serious errors. 

Throughout the paper we have tacitly assumed that we deal with 
a statistically uniform universe. If any one of the means, standard 
deviations, or covariances change in time, selection becomes useless, 
as all concepts of probability of success, efficiency index, etc. are af- 
fected. It is advisable to install a statistical quality control program 
for predictor and criterion variables in order to get advance infor- 
mation on time shifts of population parameters. 


III, Application 


The foregoing principles were applied to data collected during 
World War II. A battery of eight tests was given to a group of artisan 
trainees of the South African Air Force before entering the training 
school. The whole group was permitted to go through the course at 
the end of which an objective trade test was given incorporating the 
subject matter covered during the training period. Altogether there 
were 287 trainees for whom both battery and criterion scores were 
available. 

In order to apply the formulas derived in this investigation it 
is necessary that the data conform with the assumption of a bivariate 
normal population. Both raw battery and raw criterion scores of the 
above group were non-normal. The first step then was to normalize 
the marginal distributions with the help of a graphical procedure* 
which simultaneously standardized the distributions by making the 
means and standard deviations of the sample approximately equal to 
50 and 10, respectively. The normalized standard score distributions 
of the group are given in Table 3 together with the y? tests for nor- 
mality. Although the graphical method used for normalizing the dis- 
tributions constitutes a legitimate estimation procedure, it is not 
known how many degrees of freedom should be deducted from the 


*A description of this method is considered to be beyond the scope of this 
study. 











HERBERT S. SICHEL 


TABLE 3* 





25 








Battery Scores 


Criterion Scores 








Standard Scores Expected Observed Observed 
80 - 85 1 - 
75 —80 | ” 1 1 
70-75 4.7 5 5 
65 — 70 12.6 10 10 
60-65 26.4 26 31 
55 — 60 43.0 39 45 
50 — 55 55.0 55 56 
45 — 50 55.0 53 48 
40 — 45 43.0 44 47 
35 -— 40 26.4 81 27 
30-35 12.6 13 11 
25 - 30 4.7 8 5 
20 — 25 - Z 
15-20 - 1 ve 
N 287.0 287 287 
Miss — 1.947 2.924 
Piss ~ 85 71 
Mean 50.0 49.3 50.1 
S.D. 10.0 10.2 9.8 





*The brackets in the table indicate the groupings for the ** test. 


total cell numbers. For matters of expedience it is suggested to use 
the customary three constraints. If we do this, we must not interpret 
the z* test with the help of a probability statement as we ordinarily 
do, but we should look at it as a measure of success for our normali- 
zation procedure. 


° 


e 
x 





Battery Scores 


754 


67-544 


$54 











T T , T t T s 
7-8 ars Ins 475 57.5 es 97-5 

rie 

Criterion Scores y_ 


FIG. 8. OBSERVED REGRESS/ON 





Resor 


ws 2867 


LINES 








Battery Scores x 








26 PSYCHOMETRIKA 


Although the marginal scores appear to have been normalized 
satisfactorily according to Table 3, it does not follow that the bi- 
variate distribution is normal. The next step consists of setting up a 
bivarate table where each subject’s battery and criterion raw scores 
are converted into a pair of normalized standard scores (Table 4). 
Linearity of regression may then be tested by plotting the observed 
regression lines (Figure 8). 


TABLE 4 
Criterion Scores y 
% In 


+ ” 
’ \ 


8 


21 ! 


L 





The observed array means in Figure 8 do not deviate more from 
linear regression than would be expected in a sample of N = 287. We 
may now assume that the data of Table 4 conform with the hypothesis 











HERBERT S. SICHEL 27 


of a normal bivariate population. In standard scores we have 


yg=50.1 , = 49.3, 
S=— 98 , 8, = 10.2, 
R= 4.591, L= 45.0. 


‘Substitution of these values into equations (22) will yield the re- 
quired best estimate of the S.O.C. curve. Prior to the derivation of 
the formulas in Part II, a detailed numerical analysis had been un- 
dertaken for 


y=50.00 , <= 50.00, 
sy=10.00 , 8, = 10.00, 
R= _ .5906, L= 45.00. 


As these values are not radically different from the above correct 
sample values, the author may be forgiven for using the latter set 
for purpose of illustration rather than the former. The equation of 
the selector’s operating characteristic becomes 


p(x) = (3.03979 — .073188z2) , 


where x is measured in standard scores. p(x) is evaluated in Table 
5 and graphed together with the observations (based on Table 4) 
in Figure 9. The 99% confidence belt of the S.0.C. curve as derived 
from an expectancy chart may be obtained from Hald’s Tables (9). 
For the estimation by equation (22) we have to lay off 2.58[var 
p(x)]' from p(x) for a corresponding 99% confidence belt. The ex- 
pression for var p(x) (equation 25) contains parameters p and 


L—¥y 


Sy 


. Substitution of the sample 





which may be estimated by R and 
values gives 
var p(x) = .00442 (X? + .858X + .941) {erf (—.619 — .731X) }°. 


—£ 
where the x’s are the usual standard scores. 





, R x 
X is estimated from . 


The 99% upper and lower confidence limits for the expectancy chart 
method of estimation are given in the columns @ and 6 in Table 5 and 


the limits for the statistical method in columns 4 and 4. A glance at 


Fig. 9 shows how much narrower the confidence belt of the statistical 
estimation is. Incidentally, this particular battery furnishes an ex- 
ample for a confidence belt without a constriction in the centre, i.e., 
D>0O. 











28 PSYCHOMETRIKA 











TABLE 5 
ne] ~~ 
~~ et) te ov ao] ~ & 
_- 2 + o -z 
32 £8 z Em = S 
a ee a: 23. & é 
nan of Ha On BW RM (le ; a ia 4) 
2.5 2.856 .002 005 .007 .000 
7.5 2.491 .006 _-_---—- — — 8 2 2 
12.5 2.125 .017 027 .044 .000 
17.5 1.759 039 047.086 000 
22.5 1893 .s2} 111 9 1 .585 001 O71 .158 O11 
27.5 1.027 .152 092 .244 .060 


82.5 661 .254 385 18 5 755 .094 .100 .854 .154 
37.5 295 = .384 290 31 9 5386 .110 .094 .478 .290 
42.5 —-.071 .528 523 44 23 -716 «6.824 «6.076 §=.604_—s «452 
47.5 —- 487 .669 679 538 36 830 .494 .060 .729 .609 
52.5 —- .802 .789 800 55 44 917 628 .052 841 .787 
57.5 -1.168 .879 923 39 36 991 .745 .046 .925 .833 
62.5 -1.534 .938 962 26 25 1.000 .747 .086 974 .902 
67.5 -1.900 .971 1.000 10 10 1.000 .589 .024 .995 .947 
72.5 -2.266 .988 014 1.000 .974 
717.5 -2.632 .996 | 1.000 7 7 1.000 .469 .007 1.000 .989 
82.5 -2.998 .999 003 1.000 .996 





fstimated S.0.C. Curve 
Observed Proportions of Success © 


Contidence Belt of $.0.C. based on : 

















a. 
¥ Expectancy Chart fsaase 
' 1-00 Statistical Estimation {77 
8 -80 4 
8 ‘ 
4 od \ a R = +591 
S RY . NW = 287 
S Ae, L= 55/5 
8) -20 | Qk. 
& Wey Ne. ‘ 

4] T T T a —— t 

62:5 62-5 42-5 22-5 25 


battery Scores “Ic’ 





FIG.9 SELECTOR’S OPERATING CHARACTERISTIC 











HERBERT S. SICHEL 

















000° 000° 
000° 000° 
000° 000° 
000° 000° 
000° T00" 
600° S66" 900° 
L00° =*d\ 320 
T30° L90° 
S70" a 
890° SIs 
vLO 66a" 
9$0° LLY 
080° 960° 
TIO 980° 
£00" 0TO" 
T00° 200° 
000° 000° 
000° 000° 
000° 000° 
sary! 
Mg “qeqorg 
Fae Jt0 (A)I — pearaquy 

= SSBID 











000° 0000° 666° SLOT 
000° 0000° S66 SOT 
000° 0000° L86° < "16 
000° T000° 696° "36 
300° v000° vé6" g°L8 
TIO" 0200" &L3° 9°38 
St0° T600° 082° &é3" QLL 
T&T LT80° so" GOL 
183° 6980" Lt¢° 00s" "19 
6¢er" 968T° ele" soe g°39 
gor" TOS" Sts’ 36a" QLg 
9s¢° L98&" 9FT 6LT 3s 
06T" L98E 8L0° T30° GLY 
010° TOS L&0° TZ0° A 
8T0° 928T° 9T0" 000° GLE 
£00" €980° 900° 000° GSE 
000° LTE0" 600° LZ 
000° T600° T00° 000° 3°3S 
000° 0200° 000° Q LT 
fig zg fy 
ae pre (A) d oo es jue (A)d (A) 4d fA so100§ 
- ei * - pezyeuysy pearssqQ  pilvpurig 
sayeUIpigQ peyeurys| poyeunnsy 








9 ATAVL 











30 PSYCHOMETRIKA 


&fstimated A.O.C. Curve 
Observed Aroportions of Selection ° 
Estimated and Observed --x 


le a . = = = 
100 Chance Selection *- 


R= -$9/ 
N= 287 
A= 6055 


Probability of Selection p(y) 

















Criterion Scores “y’ 


FIG.10 APPLICANTS OPERATING CHARACTERISTIC 





The Applicant’s Operating Characteristic was calculated from 
equation (22) for a battery cutting score of 4’ = 60-00 S/S. It is 


p(y) = ©(4.89862 — .073188y). 


Estimated and observed values are shown in Table 6 and Figure 
10. According to Table 4 there were 43 trainees who exceeded the bat- 
tery cutting score 4’ = 60.0. These 43 would have been chosen in case 
of an actual battery selection. In order to show what happens if 438 ap- 
plicants were selected by a chance procedure, numbers from 1 to 287 











TABLE 7 
Observed Proportionate 
Criterion Observed Observed Expected Frequencies 
Score Observed Frequency Proportion Proportion N= 48 
Range Frequency E, of of Chance Battery 
y (S/S) N=48 N=287 Selection Selection Selection Selection 
75- - i oa 16 .000 023 
65-75 2 15 18 16 .047 .209 
55-65 13 76 Ag .16 802 489 
45-55 15 104 14 16 349 .256 
35-45 10 74 14 16 232 .023 
25-35 3 16 19 16 .070 .000 
—25 _ 1 — 16 .000 .000 





Totals 48 287 150 159 1.000 1.000 

















» 


HERBERT S. SICHEL dl 


were assigned to each of the trainees. With the help of a random 
sampling number table, 438 numbers were drawn at random and the 
battery and criterion scores of the individuals to whom the drawn 
numbers pertained were noted. The result of this experiment is given 
in Table 7 and plotted in Figure 10. It shows again experimentally 
that the probability of selection is the same for all individuals irre- 
spective of their true abilities if the selection procedure follows a 
chance pattern. 

We now proceed to set up the Quality-Gain Diagram. The true 
ability (criterion) distribution of the unselected population of appli- 
cants (and also of the chance selected trainees) is 


y2 








erf (Y)dY =——e ? @dyY, 
V 2a 
with Y in standard measures. As 
= a. 3 ’ 
Oy 


: so that we may find the ordinates of the 





we estimate Y by : 
} 7] 
crite’ion distribution of the chance selected group from 


y—y 
erf ( + | 
8y 


where the y’s are observed standard scores. These ordinates are 
worked out in the fourth column of Table 6 and graphed as the solid 
curve in Figure 11. The criterion distribution of the selectees was 
given in equation (11) as 





¢(Y)dY = 





1 ie 
O(a) erf(Y)a(Y)dyY. 


Its ordinates may be estimated from 


Eilers Seelam poe 


Numerical values are given in the fifth column of Table 6 and are 
shown as the broken curve in Figure 11. In order to compare the 
theoretical distributions with the observations the observed propor- 














~~ 
bo 


PSYCHOMETRIKA 


















ria Successes Failures 
$0 4 . 
60 4 Chance Selected ———.- 
° 
Battery » s---= 
3 
$ ie NW = 43 
‘ R= -$9/ 
Q)} .20 4 ZL = 45 Ss 
N = 60 SS 
ae 
“00 








v a a es + r r r rT 7 r . 
WS GS BFS B25 F7S F2S$ EFS E625 S$7§ SPS 475 425 BFS BS 2IS 2S 175 15 FS BS 


Criterion Scores “y’ 


FIG.11 QUALITY GAIN DIAGRAM 





tionate frequencies are listed in the last two columns of Table 7. The 
sample size for both methods is N = 48. 

The chance selection figures are based on the observations of the 
drawing experiment (Table 7) and the battery selection data origi- 
nate from the criterion distribution of the selectees in Table 4. All 
observations are plotted in Figure 11 and show satisfactory agree- 
ment with the estimated distributions. Previously it had been men- 
tioned that formulas (7) and (8a) could be evaluated numerically 
by a mechanical quadrature. Formula (8a) may be written 


+00 


_ <= [o(ay}> f erf(Y)a(Y)dY 
B 


and its statistical estimate 


mle (F2) 


+00 


-1 . sie a : , 
p(y) ert (7 =) d ( st) 
« Sy Sy 


L-v 





A comparison of the heading of the fifth column in Table 6 and the 
function to be integrated in this latter expression shows the two to 
be identical. We can, therefore, effect the integration by first switch- 














SY 





HERBERT S. SICHEL 33 


ing from ordinates to areas with the help of a quadrature formula 
and then adding the partial areas from the upper tail to the limit 





—Y _. 45.0 S/S. The ordinates were calculated for the midpoints 
Sy 
of the class intervals and we can use the simple quadrature formula 


+2 


, W 
j F(a)de =F (yr + 22 + Yn), 


where W is the width of the class interval in standard measure. The 
resulting areas (which represent the class interval probabilities of 
the estimated criterion distribution of selectees) are shown in the 
sixth column of Table 6. Addition of the probabilities above 45 S/S 
finally gives the estimated proportion of successful selectees 


Po = .952. 
Computation of p. with the help of Pearson’s Tables is complicated 
but more exact. It leads to 

P.= 9517. 


The estimated course wastage rate is 
W.R. = 100(1 — po) = 4.8% . 
For a chance selection we have from equation (8b) 
7m = © (8) 


and its estimate 


The corresponding wastage rate is 
W.R. = 100(1 — pi) = 30.9% . 
The course wastage rates are shown as the shaded portions in Fig- 
ure 11. For this particular example test selection reduces the course 
wastage rate based on chance procedure by more than five sixths. 
The estimated efficiency index of the battery is from equation 


(8) 


A 


H = 100(p. — p,) = 26.1% . 


Consequently, by means of this test battery, we expect to gain over 
chance procedure 26 successful trainees in every 100 selected. 











34 PSYCHOMETRIKA 


The estimated mean criterion score of the chance selected trainees 
is 7 = 50.0 S/S and that of the battery selected group is given by 


equation (32) 
i 
erf (~ ) 
8; 


g= Yr 8, + y= 59-0 S/S. 
-—-Z 
6 (= ) 
- 


Hence the average quality of the battery selected group represents 
an improvement of 9 standard scores over a chance selected group. 
Table 8 affords a comparison between some expected and ob- 








TABLE 8 








~ Chance Selection 


~ Battery Selection 
Statistical Statistical 
Estimate Observed Estimate Observed 
Parameter N = 287 ‘=8 N = 287 N= 





Percentage of success- 
ful selectees 100 7, & 


100 7, 959, 98% 69% 10% 
Wastage Rates 5% 2% 31% 30% 
Efficiency Index H 26% 28% ae —_ 
Criterion Mean Scores 

yY&n 59.0 59.5 50.0 50.2 


served quantities used to describe various aspects of selection. The 
expected values are based on statistical estimates derived from the 
total sample of N = 287, whereas the observed values are based on 
samples of N = 43, the number of trainees actually selected, and 
shown in Table 7. 

The standard error of 7 , based on the total sample of NW = 287, is 


and that of g, also based on N = 287 and computed from equation 
(35), 
o(g) =0-9S/S. 


ting score L without, however, altering the selection ratio and for 
Suppose now that we were permitted to shift the criterion cut- 











HERBERT S. SICHEL 35 


that matter 2’. For maximum efficiency index we have from equation 
(10) 





‘i yery 
far =(*—Y™ f ) a. 


p 
The estimate for Bua is 


bee = (AE) (A ) = er 





R Sz 
and 
Low == Sy a r y = 03-3 S/S Ss 


For such a cutting score the Efficiency Index H is approximately 38% 
which is considerably higher than the one we had found for L = 45 
S/S. Just to show that this upward difference in efficiency is also 
present in the sample of 287 trainees let us put L at 55 S/S which is 
not far removed from Ly»... From Table 4 we see that 31 out of the 
43 selected would have had criterion scores > 55 S/S and 


31 
1 ‘=—— = 72%. 
, 43 +s 


If no selection had taken place, 92 of the 287 trainees would have had 
criterion scores > 55 S/S according to the same table and 


92 


1 = — = 82%. 
100p. 287 32% 


Consequently 


a 


H = 100(p.' — pi’) = 40%, 
which is 12% higher than the observed Efficiency Index for L = 45 


S/S. (That this H exceeds the maximum of 38% is due to chance, as 
p2' is based on N = 48 only). 


In constructing the Cost-Utility Diagram we again use the ordi- 
nates of the estimated criterion distribution of the total applicant 
population shown in the fourth column of Table 6. The correspond- 
ing observations are given by the marginal criterion distribution of 
the total number of trainees in Table 4. In order to reduce the actual 
frequencies E, to comparable ordinates, we have to divide the ob- 
served cell frequencies by a factor N,W where N, = 287, the total 











36 PSYCHOMETRIKA 


sample number and W = 0.5, the width of the class interval in stand- 
ard measures. The estimated and observed criterion distribution of 
all the trainees is plotted in Figure 12. 

































Z 
-$0 4 Successes » = foilures = 
0 4 " 
Unselected ———z- 
Selected —~~— -—> 
i #0 4 
x £= 4593 
$ XN = 6055 
R =m S9/ 
g "20 4 N= 287 
155 (154) | 87(90) M- 43 
0 4 
i Od 
43(42)~__ 2(7) 
o ae x r=, =o 


OS 8S 775 FOS FS 625 S7§ S25 GIS 425 BS WS DWE BS 175 2S 


Criterion Scores *y’ 





FIG. /2 COST -UTILITY DIAGRAM 





For the purpose of the Cost-Utility Diagram the criterion dis- 
tribution of the selectees is given by equation (20) 


WY (Y)dY =erf(Y)a(Y)dY 


and its ordinates are estimated from 


erf(%—") ply). 


“7 
The numerical values are computed by multiplying figures in the 


filth column ot Table 6 with a factor ® ( a ) . They are shown in 
8 


& 


the last column of the same table. To arrive at corresponding observed 
ordinates we first form the marginal criterion distribution of the 
N. = 48 selectees by adding all columns above the cutting score 
i, = 60 S/S in Table 4. Next we divide individual cell frequencies by 
the factor N,W which we used already before. Finally observed and 
estimated ordinates are plotted in the Cost-Utility Diagram (Fig. 
12). The axis through L = 45 S/S divides the Cost Utility Diagram 

















HERBERT S. SICHEL 37 


into four areas representing four groups of trainees. The expected 
numbers in these partitions may be calculated from the following 
fourfold Table 9(a). 








TABLE 9 
Number of in- 
Number of ca- capabletrain- Total Number 
Ppable trainees ees rejected of rejected 155 (154) 87 (90) 242 (244) 
rejected N(1— 7, — 7, trainees 
N (7, — 7% 7) +7 7) N(1— 7) 





Number of ca- Number ofin- Total Number 

pable trainees capabletrain- of selected 

selected ees selected trainees 43 (42) 2 (1) 45 (43) 
N77, N(%—7, Ty) N7, 














Total number Total number 








of capable of incapable Total number 198 (196) 89 (91) 
trainees trainees of trainees 
Na, N(1—7,) N 287 (287) 
(a) ~ {b) 
where 
i € ! 
Ay = ce) - |= (a ) 
Oz 


and a, and a, are as previously defined. The estimates for a , a, and 
72 Were also computed before. They are 


Po — .159, 
p, = .691, 
D2 .952. 


For N = 287 we expect the four groups to be of magnitudes as indi- 
cated in Table 9(b). The actually observed frequencies derived from 
Table 4 are shown in parentheses. To the casual observer it may ap- 
pear that the agreement between expected and observed frequencies 
in Table 9(b) and in general between all the expected and observed 
values mentioned in this investigation seems to be too good to be true. 
The reason for this is, of course, that we actually have six constraints 
operating by making the means, variances, covariance, and number 
of the sample equal to corresponding quantities in the expected bivari- 











38 PSYCHOMETRIKA 


ate distribution. Furthermore, it is not clear how many more degrees 
of freedom were lost.in the initial normalization procedure applied 
to the raw score sample. 

In Table 10 a final comparison between the expected and ob- 
served number of successful selectees is given. In addition upper and 
lower probability limits are shown. The numerical values are based 
on formula (35). 

















TABLE 10 
Ex- Ob- 
pected served 99% Probability 
Sic- Suc- Limits 
Standard cesses cesses 
Seores-x E, p(x) E.p(*) E,p,(%) Lower Upper 
80-85 ] .999 1.0 4 a” 1* 
75-80 1 .996 1.0 ok 1* 
70-75 5 988 4.9 5 4* 5* 
65-70 10 971 9.7 10 8 10 
60-65 26 .938 24.4 25 21 26 
55-60 39 .879 34.3 86 29 39 
50-55 55 789 43.4 44 36 51 
45-50 53 .669 35.5 36 27 44 
40-45 44 02 23.2 23 15 32 
35-40 31 384 11.9 9 5 19 
30-35 13 .254 3.3 5 0 7 
25-3 8 152 13 1 0* 4* 
20-25 ~ .082 - - - - 
15-20 1 03 0.0 0 0* 1* 
Totals 287 193.8 196 





Probability limits marked with asterisks were computed from the ac- 
curate binomial distribution. Expected and observed frequencies of 
successful selectees are closer than the probability limits would sug- 
gest. This again is due to the fact that the theoretical probabilities of 
success 2(z) were estimated from this particular sample. In addition 
We now introduce new constraints by using the individual sample fre- 
quencies EZ, in computing the expected number of successes. This also 
accounts for the discrepancy between the total number of expected 
successes in Table 10 (193.8) and 9(b) (198). A far more stringent 
test would be to make a comparison between expected and observed 
successes of an independent sample but using the estimates of the 
original. 

Howeyer, the approach to subsequent follow-up studies as indi- 

















HERBERT S. SICHEL 39 


cated in Table 10 has many advantages. After the initial validation 
we generally deal with truncated samples owing to the fact that se- 
lection has already taken place. Cutting scores very frequently are 
adjusted to the needs of the day. Often a short-time intake is not rep- 
resentative of the long-time distribution of scores above the cut-off 
point. Yet we need not be disturbed about all these factors if we make 
predictions in terms of probable successes and validate by having 
these predictions come true within the previously stated probability 


limits. 

REFERENCES 

1. McClelland, W. Selection for secondary education. London: Univ. of London 
Press, 1942. 

2. Bittner, R. H., and Wilder, C. E. Expectancy tables: a method of interpret- 
ing correlation coefficients. J. exp. Educ., 1946, 14, 245-252. 

3. Bingham, W. V. Great expectations. Personnel Psych., 1949, 2, 397-404. 

4. Taylor, H. C., and Russell, J. T. The relationship of validity coefficients in 
the practical effectiveness of tests in selection: Tables and Discussions. J. 
appl. Psych., 1939, 23, 565-578. 

5. Brogden, H. E. When testing pays off. Personnel Psych., 1949, 2, 170-183. 

6. Jarrett, R. F. Per cent increase in output of selected personnel as an index 
of test efficiency. J. appl. Psych., 1948, 32, 185-145. 

7. Berkson, J. “Cost-utility” as a measure of the efficiency of a test. J. Amer. 
statist, Ass., 1947, 42, 246-55. 

8. Kendall, M. G. The advanced theory of statistics, Vol. II. London, Charles 
Griffin and Co., 1946. 

9. Hald, A. Statistiske metoder, Tabel-Og formelsamling. Copenhagen, 1948, 


pp. 54-55. 


Manuscript received 12/20/50. 




















PSYCHOMETRIKA—VOL. 17, NO. 1 
MARCH, 1952 


NOTE ON THE COMPUTATION OF BISERIAL CORRELATIONS 
IN ITEM ANALYSIS 


LAURENCE SIEGEL 
STATE COLLEGE OF WASHINGTON 
AND 
EDWARD E. CURETON 
UNIVERSITY OF TENNESSEE 


A rapid method is described for machine computation of biserial 
correlations in item analysis with several criteria. This method has 
been found to yield biserial correlations from punched IBM cards 
at the rate of about 41 per hour. 


Biserial correlation techniques are used extensively in determin- 
ing the discriminating powers of test items. Dubois, Dunlap, and 
Royer (1, 2, 3) have described methods for the rapid calculation of 
biserial correlations by means of nomographs or IBM equipment. The 
most recent of these methods (1) requires that each item be coded 
in a separate column of a standard 80-column IBM card. 

The present method was devised in connection with a problem 
which required the correlating of 483 item-responses with each of 
eleven criteria (N = 597). Each item was actually an alternative of 
one of 97 multiple-choice questions in a biographical inventory. Each 
item-response is of course dichotomous; the individual either does 
or does not mark it. Ten item-responses were punched in each col- 
umn of the IBM card, punching only if the item was marked.* The 
criterion scores were assumed to represent continuous variables, but 
each of them was coded in ten categories (0-9) so it could be punched 
as one digit in a single column of the card. The card design was as 
follows: 


Columns 1-4 Subject identification number 
Columns 5-53 Item responses 
Columns 60-70 Criterion scores 


The formula for the biserial correlation may be written: 


*An alternative procedure would have been to code each item in a separate 
column and to use the reproducer to punch the criterion scores into multiple cards. 


41 











42 PSYCHOMETRIKA 








m—M 
nm : =. (1) 
o z 
where 
m is the mean criterion score of the group selecting a given 
item ; 
M is the mean criterion score of the total group; 
o is the standard deviation of the criterion scores of the to- 
tal group; 
p is the proportion of the total group who marked the given 
item; and 
z is the ordinate of the unit normal distribution correspond- 
ing to the tail-area p. 
Let 
N be the number in the total group, 
m the number marking the given item, 
> a summation from 1 to N , and 
S asummation from 1 ton. 
Then 
- (2) 
p=— 
N 
SX 
“_—=—— (3) 
n 
rx 
M =— (4) 
N 
> X*—-MDX 
a - (5) 
N-1 


Formula (1) may be rewritten in the form, 
p= (1/Ne) (1/z) (SX) — (p/z) (M/o). (6) 


A single sort on each criterion provides a frequency distribution 
from which M and o may be computed by (4) and (5). The values 
of (1/No) and (M/c) are then constant for all item-correlations with 
a given criterion. 














LAURENCE SIEGEL AND EDWARD E. CURETON 43 


The cards are then sorted on each item position. (Wherever a 
question has mutually exclusive alternatives, a single sort provides 
subgroups of cards for each of the alternatives.) The subgroup of 
cards for those who marked each given item is then run through the 
tabulator, which is used to accumulate the eleven sets of criterion 
scores. The card count is the value of for the item and the eleven 
totals are the values of SX for the eleven criteria. 

Since N remains constant throughout, a special table giving val- 
ues of (1/z) and (p/z) directly for all possible values of m may be 
prepared from standard tables of the normal distribution. Dunlap 
(2) has already tabulated the values of (p/z) with argument p. The 
values of (1/z) and (p/z) are constant for each item in all criteria; 
only SX is different for every item and criterion. 

The sorting and tabulating for 5,313 correlations required 85 
hours. Less sorting time would have been required if all questions 
had had mutually exclusive alternatives. Subsequent computations on 
a desk calculator produced biserial correlations at an average rate 
of about 120 per hour. Thus the data were converted from punched 
IBM cards to biserial correlations at the rate of about 41 per hour. 


REFERENCES 
1. Dubois, Philip H. Note on the computation of biserial 7 in item validation. 
Psychometrika, 1942, 7, 148-146. 
2. Dunlap, Jack W. Note on computation of biserial correlations in item evalua- 
tion. Psychometrika, 1936, 1, 51-60. 
3. Royer, Elmer B. A machine method for computing the biserial correlation 
coefficient in item validation. Psychometrika, 1941, 6, 55-59. 


Manuscript received 4/16/51. 
Revised manuscript received 6/4/51. 

















PSYCHOMETRIKA—VOL. 17, NO. 1 
MARCH, 1952 


FACTOR ANALYSIS OF THE ARMY AIR FORCES SHEPPARD 
FIELD BATTERY OF EXPERIMENTAL 
APTITUDE TESTS* 


J. P. GUILFORD 
UNIVERSITY OF SOUTHERN CALIFORNIA 


BENJAMIN FRUCHTER 
UNIVERSITY OF TEXAS 
AND 
WAYNE S. ZIMMERMAN 
BRANDEIS UNIVERSITY 


A factor analysis was made of 39 experimental printed aptitude 
tests and seven reference tests selected from the Army Air Forces 
Aircrew Classification Battery. Thirteen factors were extracted and 
two independent orthogonal rotational solutions were completed. 
Twelve factors were interpreted. Of these, seven were clearly identi- 
fiable with previously known factors: numerical, perceptual-speed, 
spatial-relations, visualization, visual-memory, paired-associates- 
memory, and length-estimation factors. A planning factor was not 
as clearly identifiable. A reasoning factor was probably a composite 
of two or more factors that failed to separate. A new factor possibly 
has to do with orientation with respect to the points of the compass. 
Two factors were doublets, each apparently specific to one kind of 
test. Better conceptions were gained of the spatial-relations and vis- 
ualization factors and of the kinds of tests that measure them best. 
Efforts to improve measures of unique factors were not uniformly 
successful. The attempt to duplicate a psychomotor test rather di- 
rectly by analogy in printed form failed almost completely. 


Introduction 


In the late stages of World War II, Psychological Units of the 
Army Air Forces Aviation Psychology Research Program had de- 
veloped a large number of printed experimental tests for which no 
validity or factor-analysis information had been obtained. In the 


*For support of a large part of the investigation the senior author is in- 
debted to the Social Sciences Research Council for a grant-in-aid. Dr. Fruchter 
supervised the computational work in the extraction of factors and in one of the 
rotational solutions. Much of the work was done while he was employed by the 
Air Training Command Human Resources Research Center. While the opinions 
and conclusions are those of the authors, they wish to express their apprecia- 
tion to that organization and especially to Dr. John T. Dailey for making the 
facilities possible and to Mr. William B. Lecznar for technical and computational 
assistance. 


45 








46 PSYCHOMETRIKA 


construction of the tests, efforts had been made to achieve better and 
more unique measures of certain primary abilities, to achieve a bet- 
ter understanding of those abilities, and to determine whether other, 
hypothesized, factors would be found. Since aircrew training had 
been very materially reduced in 1945, it was very unlikely that the 
trainees who were tested at the time would yield validation data. On 
the other hand, the validities of the better known factors had been 
so well estimated that it would be possible to make reasonable guesses 
of the minimum validities of the new experimental tests from a 
knowledge of their loadings in those factors (5, p. 843). All of these 
considerations made factor-analysis studies of the new tests extreme- 
ly desirable. 

Two large experimental testing projects were accordingly 
planned and executed during the spring and summer of 1945. In the 
first one, a group of 45 experimental tests was administered to a 
large sample of Aviation Students on the day immediately following 
the battery of 20 classification tests, at Sheppard Field. This set of 
tests became known as the Sheppard Field Battery. At that time, the 
Aviation Students were almost entirely at the eighteen- and nine- 
teen-year levels. The total number involved in the study was 8,158, 
but since only one day’s time was available for experimental testing, 
and since not all 45 tests could be completed in one day, not all of 
this sample took all tests. The 45 tests were subdivided into five half- 
day batteries of approximately nine tests each. Each sub-battery was 
administered in combination with every other sub-battery to approxi- 
mately 400 students. Within sub-batteries the correlations were based 
upon nearly 1,600 students. The correlations between experimental 
and classification tests were based upon similar numbers. The classi- 
fication-test intercorrelations were based upon the entire sample of 
8,158 students. The entire correlation matrix for the 65 tests has 
been published in one of the Army Air Forces Reports (5, p. 902), 
with full particulars about sample N’s. The portion of that matrix 
analyzed for this report is given in Table 1. 

To be more specific concerning objectives for the experimental 
tests, special efforts had been made to measure better the factors of 
perceptual speed, spatial relations, visualization, visual memory, and 
length estimation. There were also efforts to clarify the nature of the 
factors of space, visualization, reasoning, and planning, particularly. 
it was thought quite possible that two new factors would be found, 
one involving space tests in which compass directions played a promi- 
nent role, and one involving tests of ability to resist illusions in ge- 











J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 47 


ometric patterns. These objectives will be elaborated upon and oth- 
ers mentioned as the tests are described. 

The analysis to be reported here did not include all 65 tests, for 
several reasons. The classification tests (September 1944 battery) 
had been analyzed previously and the results reported (5, p. 812). 
Certain classification tests were included in the present analysis be- 
cause they would help to solve for some of the known factors that 
would be expected in the experimental data. By excluding others 
that had little relation to the experimental tests, the number of fac- 
tors with which it was necessary to deal was reduced. The classifi- 
cation-battery factors of psychomotor coordination, judgment, me- 
chanical knowledge, and verbal comprehension were thus ruled out. 
Two experimental tests of the last two factors were also eliminated 
when it was found that they correlated so little with other experi- 
mental tests. The two illusion tests were also omitted because it was 
likely that they would merely add a doublet factor to the structure. 
There remained, then, 46 tests in the analysis, seven of which were 
classification tests included for defining purposes. The 46 tests are 
described very briefly on the following pages.* 


Description of the Test Variables 

(1)+ Map Memory, CI505BX1: This test was designed to meas- 
ure pictorial memory. A schematic map is studied for four minutes, 
after which verbally stated printed questions must be answered re- 
garding it. 

(2) Figure Analogies, CI212AX1: This is a version of the 
well-known figure-analogies test. 

(3) Spatial Visualization II, CI203A: The examinee reads a 
verbal description of a solid block of wood, its sides painted different 
colors, being cut into smaller blocks. His task is to visualize these 
cutting operations so that questions can be answered regarding the 
resulting number of blocks of a given size and color. 

(4) Planning Air Maneuvers. CI408AX2: The problem is for 
the examinee to ascertain the quickest and most economical way to 
“sky write” certain letter pairs. 

(6) Map Distance CP626B: On a schematic map a number of 
towns are located around a given reference point. The examinee’s task 
is to estimate which one of any given pair of towns is closer to the ref- 


*For more complete descriptions of these tests including sample items and 


statistical data see (5). 
+The number preceding each test name and the code number following it 


are the designations used by the Air Forces (5, pp. 901-902). 











48 PSYCHOMETRIKA 


erence point. 

(7) Estimation of Length, CP631B: In the first section of this 
test the examinee must attempt to estimate which one of several giv- 
en lines is exactly the same length as one of five standards. In the 
second part he tries to pick the one line that is exactly double the 
length of a standard. 

(8) Speed of Identification, CP610C: Groups of five objects 
are shown, four of which in each group are identical with four ob- 
jects shown in an adjacent group. The examinee’s task is to desig- 
nate matching pairs of objects. 

(9) Memory for Plane Silhouettes, CI5|03AX1: The examinee 
studies silhouettes of top and side views of four to eight airplanes for 
eighty seconds, then takes a two-minute matching test on the scram- 
bled views of the same airplanes. 

(10) Directional Orientation, CP515F: Items consist of pairs 
of circular sections from aerial photographs. Although both pictures 
in a pair are identical, one is rotated. The examinee’s task is to de- 
termine the compass direction of the rotated picture in relation to a 
compass direction marked on the unrotated picture. 

(11) Visualization of Maneuvers, CI657CX2: A single view of 
an airplane is pictured in a starting position. A simple maneuver is 
described, such as a turn or a bank of a certain number of degrees. 
The examinee’s task is to select the one of five alternative pictures 
that correctly portrays the airplane’s position following the pre- 
scribed maneuver. 

(12) Planning a Circuit, QP 901A-l: Each item presents an 
electrical-circuit diagram with intersecting and intermeshed wires 
and several sets of terminals. The task is to trace the circuits vis- 
ually and to determine at which pair of terminals a battery should 
be placed in order to complete the circuit through a meter. 

(13) Path Tracing, QP 901A-V: Items are similar to those on 
the McQuarrie Pursuit Test. In each of several diagrams, lines run- 
ning in irregular fashion cross and recross one another. The ex- 
aminee’s task is to trace visually each line from its beginning and to 
mark its point of termination. 

(14) Maze Tracing, QP 901A-VII: In a full-page, complicated 
printed maze, a number of points in the pathways are marked by 
letter. The items are pairs of letters. The examinee’s task is to tell 
whether, in each item, the pathway between any two letters is clear 
or is blocked. 

(15) Formation Visualization, CP814A: Each item shows in 














J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 49 


silhouette, a top and side view of a formation consisting of either two 
or three airplanes. The examinee’s problem is to select from five 
choices the one that portrays the formation from a front view. 

(17) Visual Memory, C1514A: The examinee is given one min- 
ute to study a large aerial photograph. He then turns the page and 
selects from among several small photographs those that duplicate 
portions of the large one. 

(18) Figure Classification, CI213AX1: The task is to select 
from five alternatives the geometric figure that has the characteris- 
tics common to the three stimulus figures that set the problem. 

(19) Spatial Visualization 1, CI204AX1: For each item there 
are two or three illustrations to show, step by step, how a sheet of 
paper is folded and then cut. The examinee’s probiem is to select the 
one of five alternative answers that correctly illustrates how the 
sheet will look when unfolded. 

(20) Map Planning, CIl412AX1: A diagramatic map is shown 
on which streets are represented, some of which are blocked by bomb 
damage. The examinee must plan quickly the shortest passable routes 
for military vehicles to travel through the damaged areas. 

(21) Object Recognition, CP523A: This test is a revision of 
Thurstone’s “Cubes.” The examinee’s task is to select the one of five 
cubes that portrays correctly a turned or rotated position of a given 
key cube. 

(23) Position Orientation, CP526B: This is an adaptation of 
Thurstone’s “Hands.” In each item are shown five drawings of hands, 
arms, legs, eyes, or feet. The examinee’s task is to determine quickly 
whether each drawing represents a right or a left member of the 
body. 

(24) Aerial Orientation, CP520C: For each item the stimulus 
is a cockpit view of a shoreline. Adjacent to each stimulus picture are 
five photographs of an airplane in different attitudes. The examinee’s 
problem is to match the cockpit view of the shoreline with the air- 
plane position from which that view would be seen. 

(25) Object Identification I, CP521A-I: This is a revision of 
Thurstone’s “Flags.” Silhouettes of planes, trucks, guns, tanks, and 
ships are presented. The examinee’s task is to select from five alter- 
native answers those rotated illustrations that show the same side of 
an object as that shown in a key illustration. 

(26) Object Identification II, CP521A-II: This form is similar 
to variable (25) except that flags are presented instead of planes, 
trucks, etc. 











50 PSYCHOMETRIKA 


(27) Plane Position Memory, CI512A: On each of several 
study pages nine airplanes are shown. Following a two-minute study 
period the same nine airplanes located differently on a page are shown 
to the examinee. His problem is to recall the row in which each air- 
plane appeared on the study and the direction in which it was headed. 

28) Decoding, CI214AX2: The test. problems require the ex- 
aminee to match groups of short words written in a code of signal 
flags with the same words written in the English alphabet. 

(29) Route Planning, C1411AX1: The examinee must plan 
paths successively from four points on the periphery of printed mazes 
to goal boxes in their centers. 

(30) Flight Formation, CI654AX5: The examinee must select 
from among five alternatives the position of three airplanes after cer- 
tain moves from a given formation have been described verbally. 

(31) Aerial Landmarks, CP525C: Paired aerial photographs 
are presented, one of which shows a vertical view and the other an 
oblique view of the same terrain. The examinee’s task is to find on 
the oblique view points given on the vertical view. 

(32) Pattern Assembly, CP804A: This is a paper-form-board 
type of test. Each item requires selection of the one of five assembled 
patterns that correctly represents how the unassembled parts, illus- 
trated in a separate panel, would look when fitted together. 

(33) Block Counting, CP512B-4: In various piles of blocks the 
examinee’s task is to count as rapidly as possible the number of ad- 
jacent blocks that are touching the sides of certain designated ones. 

(34) Discrimination Reaction Time (paper), CP634A-I & II: 
This test was designed to duplicate factorially in printed form the 
psychomotor test of the same name. The examinee responds to a pair 
of black-and-white dots by marking on a specially designed answer 
sheet in one of four directions—up, down, left, or right. Responses are 
determined by the relative positions of the dots. 

(35) Discrimination Reaction Time (paper), CP634A-III & 
IV: Parts III and IV call for the same type of responses as in Parts 
I and II, but the direction of marking is determined by the arrange- 
ment of three circles, one white, one black, and one with a cross. 

(36) Plane Name Memory, CI503AX2: In each of two parts, 
the examinee studies for four minutes a group of 20 plane silhouettes, 
the name of each appearing below it. Following the study period the 
silhouettes are shown on another page with five names listed under 
each. The examinee selects the name he thinks appeared originally 
with each silhouette. 














J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 51 


(37) Planning a Course, C1406AX3: The Problem in this test 
consists of tracing lines through a diagram like that of city streets. 
The directions for the tracings are determined by a learned signal 
code. Signals are varied, however, throughout the maze, necessitat- 
ing changes in the mode of response as the maze is traversed. 

(38) Compass Orientation, CI660A: In each item one of the 
four compass directions, north, south, east, or west, is presented ver- 
bally as the initial flight direction of an airplane. Then a turn, either 
left or right, is specified. The examinee’s task is to record the new 
compass direction of flight. 

(39) Competitive Planning, CI1409AX2: This test is based on 
the familiar completion-of-squares game, sometimes called “Squares” 
or “Boxes.” In the test, the examinee must plan moves for two oppo- 
nents according to given rules, so that each completes as many 
squares as possible in a rectangular diagram, and must indicate how 
many squares are completed by each opponent. 

(40) Camouflaged Outlines, CP821A: This is a variation of 
Gottschaldt’s Figures test. The examinee’s task is to detect rapidly 
simple outline figures within complex designs. 

(41) Angle Estimation, CP218A: The test is composed of pho- 
tographs of military vehicles taken from the air at ten-degree-angle 
intervals ranging from zero to ninety. The examinee’s task is to judge 
the angle at which each photograph was taken. 

(42) Spatial Reasoning, CI211BX2: This is a revision of Thur- 
stone’s “Marks.” It requires the examinee to detect principles gov- 
erning the placement of letter symbols in spatial patterns of dashes 
and gaps. 

(51) Instrument Comprehension, CI616C*: Each item shows 
two airplane instruments, a compass and an artificial horizon, fol- 
lowed by photographs of an airplane in five different attitudes. The 
examinee must select the photograph showing the airplane in an atti- 
tude agreeing with the instruments’ readings. 

(52) Mechanical Principles, CI903B: This test is composed of 
items similar to those in the familiar Bennett and Fry “Mechanical 
Comprehension Test.” For each item the examinee must select from 
alternate answers the one that describes most accurately what is 
happening or will happen in a pictured situation. 

(53) Speed of Identification, CP610A: This test is similar to 
test (8) in this series except that the items are composed of airplane 
silhouettes, the perceptually distinguishable differences are not as 


*This test and those following were in the Aircrew Classification Battery. 








x 





52 PSYCHOMETRIKA 


gross, and in most paired views one is rotated. 

(54) Numerical Operations I, CI701B: Items are composed of 
simple addition and multiplication problems of the true-false type, 
which are printed on an IBM answer sheet. 

(55) Numerical Operations II, CI701B: This part, a continua- 
tion of the preceding test, presents simple division and subtraction 
problems with five-choice answers. 

(59) Arithmetic Reasoning, C1206C: This test is composed of 
arithmetical problems pitched at a level difficult enough to depend 
upon reasoning in the solutions. 

(63) Complex Coordination, CM701A: This is a psychomotor 
test in which patterns of lights are presented whose positions can be 
matched by making stick-and-rudder manipulations. Proper adjust- 
ments. of the stick and rudder cause new light patterns to be pre- 
sented automatically. The number of new patterns elicited within a 
given time interval is the examinee’s score 


Analysis of the Data 


The factor-analysis procedure. Factors were extracted from the 
46-by-46 correlation matrix by a combination of the multiple-group 
and the complete centroid methods (8).The multiple-group method 
was considered applicable for the extraction of the first seven factors 
(numerical, perceptual speed, spatial relations, general reasoning, 
space II, visualization, and visual memory). Residuals were computed 
and the remaining factors were extracted by the centroid method.* 
Extractions were continued until the product of the two highest cen- 
troid loadings (.049) was not greater than the standard error of the 
original correlation of the same two variables (.047). Six centroid 
factors were extracted, which made a total of thirteen factors in all. 
Positions of the variables were then plotted and pair-by-pair orthog- 
onal rotations were made using Zimmerman’s simplified graphical 
method (9). 

Independent rotational solutions were completed by two individ- 
uals. Criteria guiding the rotations included simple structure, positive 
manifold, psychological meaningfulness, and concordance with previ- 
ous well-established factor-analysis findings. Tables 1, 2, 3, and 4 pre- 

*The application of the multiple-group method included a rotation to an 
orthogonal reference frame. It. was assumed that the spaces of the factors ex- 
tracted by the two methods were mutually orthogonal. One indication that this 


is so is given by the fact that communalities before and after rotation of the 
two systems together agreed very well. 














J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 53 


sent the correlation matrix, the unrotated factor loadings, and the 
two sets of rotated factor loadings, respectively. 

Description of the factors. In the first of the two solutions all 
thirteen factors were rotated. Centroid axis XIII, however, failed to 
yield a meaningful result. In the second solution, centroid axis XIII 
was discarded prior to rotation. No attempt was made to reconcile 
the two solutions. Although there are noteworthy differences between 
the two solutions, the twelve rotated factors are all sufficiently alike 
to warrant the same identifications. The most glaring difference is 
on the visual-memory factor (rotated factor IX) in which the lead- 
ing test in the second solution occupies a relatively insignificant po- 
sition in the first. Results from the two sets of rotations are described 
and compared in the factor descriptions that follow. Tests with load- 
ings of .30 or higher in either solution are listed under each factor. 

Rotated Factor I, Numerical. The three tests, Numerical Opera- 
tions II, Numerical Operations I, and Arithmetic Reasoning are 
weighted most heavily. No other primarily numerical tests were in- 
cluded among the forty-six variables analyzed. 








Loadings 

Test No. Test Name I II 
55 Numerical Operations II 80 -73 
54 Numerical Operations I at -70 
59 Arithmetic Reasoning 46 51 

8 Speed of Identification C 36 17 

42 Spatial Reasoning 34 32 
28 Decoding 23 39 
23 Position Orientation 32 24 
38 Compass Orientation o2 18 





Rotated Factor II, Perceptual Speed. The C form (experimental) 
of Speed of Identification is more heavily weighted than the A form 
(classification), as had been predicted (5). It is interesting to note 








Loadings 

Test No. Test Name I II 
8 Speed of Identification C 58 57 

53 Speed of Identification A 53 .50 
32 Pattern Assembly 46. 42 
33 Block Counting 38 29 

7 Estimation of Length 28 32 


6 Map Distance 32 Al 











54 PSYCHOMETRIKA 


that the simpler C form, despite its greater perceptual-speed loading, 
is more complex factorially, carrying in addition significant weights 
in both length-estimation and numerical factors. The loading for 
Speed of Identification A is notably less than had usually been found 
(.64) but is reasonably close to that carried in the Air Force analyses 
ot the September 1944 Battery (.58).* Pattern Assembly appears 
with an unusually heavy perceptual-speed weight. Its perceptual con- 
tent was evident in previous Air Force analyses but not so promi- 
nently (.26). In the first rotational solution the loading for Block 
Counting is in line with its previous Air Force weight of .43. In pre- 
vious Air Force analyses, Map Distance was formerly unweighted in 
the perceptual-speed factor. Its presence here could suggest a possible 
correlation of the factor with that of length estimation. 

Rotated Factor III, Spatial Relations. As may be seen in the list 
of tests and loadings below, Aerial Orientation leads the other tests, 
with Visualization of Maneuvers C, running a close second. Instrument 








Loadings 

Test No. Test Name I II 
24 Aerial Orientation 61 62 
11 Visualization of Maneuvers C .59 .58 
15 Formation Visualization 44 44 
51 Instrument Comprehension Al AT 
9 Memory for Plane Silhouettes 35 .26 
31 Aerial Landmarks 34 21 
63 Complex Coordination .30 .40 
52 Mechanical Principles 27 36 
41 Angle Estimation 24 30 





Comprehension, ranking third in one solution and fourth in the other, 
and Complex Coordination, fifth in one and seventh in the other, were 
the best measures of Spatial Relations found in previous Air Force 
analyses. Aerial Orientation was developed in an attempt to measure 
the same space factor with even greater strength and purity, appar- 
ently with some success. Visualization of Maneuvers C is a less diffi- 
cult and consequently more speeded version of the original Form A 
of the test. It was expected, therefore, to be a better measure of space 
than of the more intellectually difficult visualization. This proved to 
be the case. The fact that Instrument Comprehension and Complex 
Coordination rank somewhat below their previous standings on the 
space factor suggests that the direction of the space axis in this analy- 


*An analysis of the intercorrelations of the September 1944 Classification 
Battery tests contained in the Sheppard Field Matrix (5). 














J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 55 


sis may be altered somewhat. In the second rotational solution the 
values for these tests approach their previous loadings more closely. 
The next most significantly weighted test, Formation Visualization, 
devised as a measure of visualization primarily, contributed vari- 
ance almost equally to this factor and the visualization factor de- 
scribed below. 

Hypotheses regarding the nature of the spatial-relations and the 
visualization factors have been advanced by several investigators (1, 
2, 3, 4, 6, 7, 10, 12). With Aerial Orientation and Visualization of 
Maneuvers C leading all other tests by a substantial margin, empha- 
sis for this space factor seems to be placed upon empathic involve- 
ment and directional discrimination. The examinee must “place him- 
self” in the cockpit of the airplane and quickly determine the direc- 
tion of motion involved—right or left, up or down— depending upon 
a correct appraisal or “feeling” for the stimulus arrangement. Orien- 
tation is with respect to his own body. As might be expected this abil- 
ity proved to be one of the most prominent in the pilot-training cri- 
terion. 

Rotated Factor IV, Visualization. A separation between the vis- 
ualization and spatial-relations factors occurred first in Air Force 
analyses (11). The tests Mechanical Principles and Spatial Visuali- 
zation I represented the visualization factor best in those analyses 








Loadings 

Test No. Test Name I II 
3 Spatial Visualization II .63 .60 
19 Spatial Visualization I 61 .60 
52 Mechanical Principles .60 55 
41 Angle Estimation .50 45 
15 Formation Visualization 45 40 
11 Visualization of Maneuvers C 44 .26 
21 Object Recognition 41 37 
2 Figure Analogies 37 41 
51 Instrument Comprehension OT 27 
81 Aerial Landmarks 34 28 
10 Directional Orientation 34 30 
59 Arithmetic Reasoning 31 33 
26 Object Identification II 31 28 
13 Path Tracing 30 .34 
40 Camouflaged Outlines .30 34 
29 Route Planning .25 33 
12 Planning a Circuit 22 32 


14 Maze Tracing .20 38 














56 PSYCHOMETRIKA 


with Spatial Visualization II holding a lesser but still prominent po- 
sition. As may be seen in the list above, these three tests clustered 
together, leading all others in weights in this factor. The fourth most 
heavily loaded test, Angle Estimation, had once been considered as a 
potential representative of a new factor. It is interesting to note, 
therefore, its heavy weight in visualization. 

The leading visualization tests seem to involve a “mental” ma- 
nipulation of objects in space. It is usually necessary to move, turn, 
twist, or rotate an object or objects in imagination and to recognize 
a new appearance, position, or condition after prescribed manipula- 
tions. This visualization factor probably corresponds more closely 
than does that of spatial relations to Kelley’s and Thurstone’s space 
factor. 

Rotated Factor V, Reasoning. In previous Air Force analyses as 
many as three different reasoning factors have been described. The 
factor represented by the tests listed below appears to be a composite 
of at least two of these factors. 








AAF Factors 

Loadings for Reasoning 

Test No. Test Name I II I Il Ill 
18 Figure Classification .60 51 03 .16 .32 
2 Figure Analogies 48 .36 34 40 31 
59 Arithmetic Reasoning 46 35 47 — — 
30 Flight Formation 45 59 16 —- — 
37 Planning a Course 42 34 24 — — 
3 Spatial Visualization IT 41 .26 39 
42 Spatial Reasoning 38 36 45 05 .38 
21 Object Recognition 04 27 _-_ —_-_ — 
28 Decoding 31 25 26 20 87 





Mathematics tests similar to Arithmetic Reasoning headed the 
list of the Air Force reasoning-I factor tests. Figure Analogies was 
one of the best tests of reasoning II, and Spatial Reasoning and De- 
coding were among the best tests of reasoning III. The first rotational 
solution is more congruent with the Air Force’s general-reasoning 
factor (reasoning I). 














J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 57 


Rotated Factor VI, Paired-Associates Memory. This cluster of 
tests obviously represents memory of some kind. In previous Aiz 
Force analyses Plane Name Memory was loaded heavily in two mem- 
ory factors, one undefined and the other called paired-associates mem- 
ory. Memory for Plane Silhouettes, however, was weighted only in 
paired-associates memory, which probably corresponds with Thur- 
stone’s factor M, or rote memory. 








Loadings 

Test No. Test Name I II 
9 Memory for Plane Silhouettes .59 59 

27 Plane Position Memory 58 59 
36 Plane Name Memory 48 AT 





Rotated Factor VII, Object Identification (doublet). In one 
analysis of AAF perceptual tests, a second space factor was distin- 
guished from the better know spatial-relations factor (5). The two 
tests loaded significantly on “Space II” were Thurstone’s Hands, and 
Flags, Figures, and Cards. Since Object Identification I and II are 








Loadings 

Test No. Test Name I II 
25 Object Identification I 64 62 
26 Object Identification II 55 57 
42 Spatial Reasoning 30 22 
23 Position Orientation .29 32 





variations of Flags, and Position Orientation is an adaptation of 
Hands, it was assumed that these three tests might define a second 
space factor in this analysis. But it can be seen in the following table 
that only Parts I and II of Object Identification have high loadings 
which make this factor a doublet, specific to this test. The factor space 
II was thus not verified by the AAF version of Thurstone’s Hands 
test. 

Rotated Factor VIII, Planning Speed. With the exception of 
Maze Tracing and Block Counting all of the tests in the list below 
had been analyzed previously in Air Force studies. In those studies, 
factors appeared involving several tests in the list. Planning Air 
Maneuvers, Planning a Course, Route Planning, and Spatial Reason- 
ing, in at least one analysis, received loadings of .33 and above on a 
factor called integration III. Integration III was described very ten- 
tatively as the ability to keep in mind and integrate a number of de- 








PSYCHOMETRIKA 











Loadings 

Test No. Test Name I II 
12 Planning a Circuit 58 43 
14 Maze Tracing 50 57 
42 Spatial Reasoning 34 31 
20 Map Planning 32 38 
13 Path Tracing 29 .29 
33 Block Counting .26 38 
29 Route Planning .26 38 
4 Planning Air Maneuvers .23 .28 
37 Planning a Course 23 .28 





tailed instructions. Planning Air Maneuvers and Planning a Circuit 
showed substantial loadings along with a moderately loaded Map 
Planning on a factor labeled planning. The factor was not satisfac- 
torily interpreted since other planning tests in the same analysis did 
not show significant weight. Owing to the inclusion of a larger num- 
ber of planning tests in this battery, rotated factor VIII is probably 
more stable than either of the two factors mentioned. The addition 
of Maze Tracing to the analysis seems to have brought out the vari- 
ance that it holds in common with Planning a Circuit. The emphasis 
has shifted away from Planning Air Maneuvers, which led in rela- 
tion to the AAF’s planning factor. Since Planning Air Maneuvers 
involves more complex and difficult problems than either Planning a 
Circuit or Maze Tracing, speed is emphasized in the new factor. Thus 
the term “planning speed” is suggested as a title. The ability repre- 
sented is obviously something more complex than speed in visual 
tracing of lines or paths since Path Tracing, the AAF’s counterpart 
of the Pursuit test, is only moderately weighted. 

Rotated Factor IX, Visual Memory. A visual-memory factor was 
anticipated because of the inclusion in the matrix of Map Memory, 
which represented the factor best in Air Force analyses, along with 








Loadings 

Test No. Test Name I II 
| Map Memory .50 43 

17 Visual Memory 36 438 

2 Figure Analogies .29 387 

3 Spatial Visualization II -26 .34 

27 Plane Position Memory 25 32 
31 Aerial Landmarks 19 .50 
15 Formation Visualization .04 33 


11 Visualization of Maneuvers .03 37 

















J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 99 


the new test Visual Memory, which was constructed for the special 
purpose of measuring the factor that bears its name. 

This is the only factor in which the results differ greatly in the 
two rotational solutions. In the first solution, Map Memory holds the 
leading position with Aerial Landmarks ranked far down the list. In 
the second, Aerial Landmarks defines the factor best, with Map Mem- 
ory not far behind. 

The reader may well question why a disparity of this magnitude 
should exist between the two loadings for Aerial Landmarks. The an- 
swer probably lies in the difference in weight given by the two ana- 
lysts to the various guiding criteria. In solution I, probably greater 
weight was given to psychological meaningfulness and invariance 
when rotations were more or less indeterminant with respect to sim- 
ple structure or positive manifold. An examination of the content of 
Aerial Landmarks would lead one to expect significant weights in 
such factors as perceptual speed, visualization, and spatial relations. 
It would seem logical that matching points on the two photographs 
should tap somewhat the same abilities measured by Spatial Orien- 
tation II, which also presents sections of aerial photographs to be 
matched.* Spatial Orientation II, an almost pure measure of percep- 
tual speed, contains items pitched at a difficulty level somewhat be- 
low that of Aerial Landmarks. The latter test adds to the perceptual 
element the complication of rotation and right-left determination, 
features believed to involve visualization and space, respectively. It 
is difficult to think of visual memory playing a dominant role in solv- 
ing Aerial Landmarks items. Visual memory is defined as the ability 
to retain an impression of pictorial material (as if : hotographically) 
and to recognize it after a short time interval (6) The vertical and 
oblique aerial views in Aerial Landmarks involve two different reti- 
nal pictures. 

Rotated Factor X, Length Estimation. In several Air Force anal- 
yses Pattern Assembly proved to be the best representative of a fac- 











Loadings 

Test No. Test Name I II 
6 Map Distance .56 43 

7 Estimation of Length 39 .86 

33 Block Counting 32 30 

8 Speed of Identification C 61 .26 

32 Pattern Assembly .28 ol 





*See (5) for a description of Spatial Orientation II. 








60 PSYCHOMETRIKA 


tor labeled length estimation. The test Estimation of Length was con- 
structed in an attempt to improve the measurement of this factor. 
As may be seen, Map Distance leads all other tests with Estimation 
of Length next in line and Pattern Assembly holding a position of 
lesser importance. 

It is interesting to note length-estimation variance in problems 
involving estimations of the number of pieces contained within a given 
dimension (as in Block Counting), and in estimation of comparative 
sizes of perceptual patterns (as in Speed of Identification C and Pat- 
tern Assembly). It is easier to rationalize strong length-estimation 
content in Map Distance and Estimation of Length than it is in Pat- 
tern Assembly, and to that extent the present findings make more 
“psychological sense” than do the AAF findings. 

Rotated Factor XI, Discrimination-reaction-time (doublet). The 
two forms of the test heading the list below were constructed in an 
effort to measure with a printed test the factors in the psychomotor 
test of the same name. The latter was characterized factorially by the 
following factors and loadings: spatial relations, .42; psychomotor 
precision, .35; perceptual speed, .22; and visualization, .20 (5). The 
printed tests failed almost completely to measure the factors of spa- 
tial relations, perceptual speed, and visualization, since the loadings 
in these factors were very small. 

Since there are no other known tests of psychomotor precision 
in this matrix, it is possible that the two forms of Discrimination 
Reaction Time have achieved the goal of measuring this factor. This 








Loadings 

Test No. Test Name I II 
34 Discrimination Reaction Time I & II -50 50 

35 Discrimination Reaction Time III &IV_ .46 44 

29 Route Planning 37 27 
27 Plane Position Memory 30 .24 

20 Map Planning 30 .28 

14 Maze Tracing 30 05 

38 Compass Orientation .09 38 





hypothesis is very unlikely, however, since previous results have 
shown that there is some psychomotor-precision variance in other 
tests in the present battery and since this factor did not emerge in 
them in the present analysis. In developing the printed discrimina- 
tion-reaction-time tests, it was believed that the chief variance that 
would carry over from the psychomotor form of the same test would 

















J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 61 


be in the spatial-relations factor. This was based upon one hypothe- 
sis that this space factor is essentially a matter of decision as to di- 
rection of movement. The absence of spatial-relation variance in the 
printed forms is evidence against that hypothesis. In this connection, 
attention is called to the fact that the space factor is well measured 
by printed tests of a different type. 

Rotated Factor XII, Compass Orientation. Before the analysis 
there was speculation as to whether tests that involved the use of 
points of the compass would bring out another space factor or wheth- 
er their variances could be largely accounted for in terms of the well- 
established spatial-relations factor. The results are not very decisive 
concerning this question. With only a single substantially weighted 
test the factor must be explained, if at all, by the content of the 
test itself. Without supporting tests or previous knowledge of Com- 
pass Orientation, common variance cannot be claimed. It would be 
very interesting, however, to find that the appreciation of spatial ar- 
rangements is mediated by two abilities, one with reference to the 
body of the observer (spatial relations) and the other with refer- 
ence to compass points. The two frames of reference might be ex- 
pected to yield separate abilities. 








Loadings 

Test No. Test Name I II 
38 Compass Orientation .66 57 

41 Angle Estimation O2 .08 
20 Map Planning 23 32 











Rotated Factor XIII, Residual. After rotation Block Counting 
gained a loading of .32 on Axis XIII in the first solution, but no other 
test approached even this low figure. Although a residual ideally 
should appear with diminished loadings only, it is very doubtful that 
a significant factor is represented. 


Conclusions 

1. In general, previously obtained factors, in the AAF results 
and elsewhere, were confirmed by this study, as were their relation- 
ships to specific tests. 

2. Better tests, in terms of increased factor loadings, seem tv 
have been developed for the factors of perceptual speed and spatial 
relations. Two previously developed experimental visualization tests 
were found to have higher loadings in that factor than had formerly 
been supposed. 








62 PSYCHOMETRIKA 


3. Tests designed as improved measures of the factors of length 
estimation, space II, and visual memory seem to be inferior to previ- 
ous ones for the same factors. 

4. Certain hypotheses concerning the nature of the spatial-re- 
lations and visualization factors were supported: (1) that the AAF 
spatial-relations factor is an ability to perceive relations of objects 
in space with respect to the observer’s body, an orientation in which 
the human body is the frame of reference; and (2) that visualization 
is the ability to manipulate visual objects mentally. 

5. The usual reasoning factors failed to separate, probably be- 
cause of an insufficient number and variety of definitive reasoning 
tests in the battery. 

6. The effort to reproduce the factorial composition of a psy- 
chomotor test (Discrimination Reaction Time) in printed forms 
failed almost completely. The measurement of most of the factors in- 
volved, however, has been achieved in various other printed tests. 

7. There is some indication of a new space factor in which ori- 
entation depends upon the compass points as a frame of reference. 
This hypothesis is worth serious investigation. 


REFERENCES 

1. Comery, A. L. A factorial study of achievement in West Point courses. Educ. 
psychol, Meas., 1949, 9, 193-209. 

2. Dudek, F. The dependence of factorial composition of aptitude tests upon 
population differences among pilot trainees. Educ. psychol. Meas., 1948, 8, 
613-683; 1949, 9, 95-104. 

3. Fruchter, B. The nature of verbal fluency. Educ. psychol. Meas., 1948, 8, 
33-47. 

4. Fruchter, B. Factorial content of right-response and wrong-response scores 
in a battery of experimental aptitude tests. Unpublished Ph.D. dissertation, 
University of Southern California, 1948. 

5. Guilford, J. P. (Ed.) Printed classification tests. Army Air Forces Avia- 
tion Psychology Research Program, Report No. 5, Washington: U. S. Gov- 
ernment Printing Office, 1947. 

6. Guilford, J. P., and Zimmerman, W. S. Some AAF findings concerning ap- 
titude factors. Occupations, 1947, 26, 154-159. 

7. Michael, W. B. Factor analyses of tests and criteria: a comparative study 
of two AAF pilot populations. Psychol. Monog., 1949, 63, No. 298. 

8. Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. of Chicago Press, 


1947. 

9. Zimmerman, W. S. A simple graphical method for orthogonal rotation of 
axes. Psychometrika, 1946, 11, 51-55. 

10. Zimmerman, W. S. Isolation, definition, and measurement of spatial and 




















J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE ZIMMERMAN 63 


visualizing abilities. Unpublished Ph.D. dissertation, University of Southern 
California, 1948. 

11. Zimmerman, W. S. Visualization. Chapter 12 in J. P. Guilford, (Ed.) 
Printed Classification Tests. Washington: U. S. Government Printing Of- 
fice, 1947. 

12. Zimmerman, W. S., and Howe, J. A., Jr. Spatial Tests. Chapter 19 in J. P. 
Guilford, (Ed.), Printed Classification Tests. Washington: U. S. Govern- 
ment Printing Office, 1947. 


Manuscript received 10/16/50. 
Revised manuscript received 3/17/51. 








4891, 








«XII}VY UOl}ZB]I10I103,uUT 
(penuyzu0D) T AIAVL 


‘ 
»—-—— — a — > ~ Qouewe 








tar “pezywo sjujod jeuled,., 
SNe dere ae Te ae UOLPVUIPIOOD xe[duoy ¢9g 
seuatibcs A “orreresereneeoeeooe FUTUOSBOY IJOUYIIW 6G 
mere ect Gana I] SU0T}e10dQ [vdIZeUNNT GG 
Me ere hn cae ee [ SUudI}ViedO [VOIIOWNN FG 
ae UOl}BdYIUEpP] Jo psedg gg 
ernie ee ag se[diousg [eo1ueyoeyy ZG 
UOISUBYeIdWUOD JUSUINAISUT TG 
cosraccennghs ethers sS°kreeee"ekenaa mae Suruosvey [eiyeds Zp 
SSHOASERSROSRAGSIRE SP Rek ae De ane UOIZVUWUIISY VPSuYy [Pp 
ee “SoUl[JNO pesepnowey QF 
oo" SUM BL @AIWTZOMUIOD GE 
a etter ys cs = UOI}EUSIIO Sseduioy gE 
a besanansssqaenansknne icy Seo aa eS1n0D B Suluur[g LE 
Sptonre 79 tgss 8aUB Sten shhee ean ae ALOWOI] OWBN oUeTg 9F 
+ = AIL ® II] SWELL UOlpveYy UOlPBUIUILIOSIGT GEg 
. ae II F [ PULL UOTpoVeYy UOTPVULUILIOSIG: HE 
Tersacoate BARIIMES STShtS eee” agen anae Buyunoyg yoolg ¢¢ 
~~ ATQuIassY U10}}8g ZE 
"SH IBUpPUBT [VLIey TE 
oo UOBULIOT FYSITT OF 
a eae corerereeeneosooeooeee SUTUUBL 9yNOY GZ 
ae eines ee’ 8c |e nroerecaesnannenser*EROOGCT 27 
ALOW ST UOIZISOg ouR[g 1Z 
PMN Ne aN ens Mie [I Uoryeoytquepy yoelqg 92 
ik tial hae ~~] UoLyBoyTyUepy] yolao Gz 
Sargnniyngss hence ages UOTZBUIIIO [VLIOY FZ 














PSYCHOMETRIKA 
a 
re) 
ou 
=) 


TP 88 08 ie ice ar ha Ge a at Sipstanriced “UOHBIUIIO UOWISOg EZ 
OF 68 8E OE vackM sak son ree ae Se “UOrzIUSOIIY yelqO Zz 
GE SE SZ 8S 82 eae ea Ge | ay > a dew 0z 
Tv 88 S&P 9% LP SE ViMaMa sn rair J UoIzezi[enstA [eyedg 6] 
GZ &Z LS FT GE LI SE Paella aS cis ie UOIZBWOYISSB[H VINSI T 
6G &Z VS OT 2S EZ TE LT naersenssi ates ses cnonen tet Sy Asie eaeaneee ALOU [BNSIA LT 


88 6E LG ZE GF OF LS 08 62 


“UOTPBZENSTA UOIFBULIOT GT 
&& TE SE 62 VE HP EP ES 62 TV 


“SuUlNeLYL OZV FT 





















VE GS 62 SZ GZ TE 8E OT OT BE EE rreneenerossenacscsnsrarqanstontensares? ao maT AAT 007 
€ 88 Bb &2 IS 18 LF 8 8s FP SP 6E sitneahncnseeaas nsnneinetgaanatee malta ZI 
8& Zh 09 TE 8V &Z BF LZ 8Z LG OF FE 8E nei wer “SIOANOUBP JO UOIZVZIPENSTA TT 
OF TV TV 62 Gb 82 LY &@ SE SF 8E LE IP 8h Rpm en" TT OTIBIUILIQ [BUOTZAIIG, OT 
62 66 Sb 12 9% T%@ Gh ST IS 8& SZ FZ SE SF IE ormrrnnnnnno Ss aAQONOUTIS oUB[d 1oJ ATOWBW 6 
bZ GZ LT TE Te 8E ES 62 8G &S GE OF SE 9% LE FZ ae UuoIZBdyIjUep] JO paedg g 
O2 GE St GR PE ee FE ot OF 60 Te 21 Si OE OZ G0 GR rrnennennenerncneneeeeaneeeme yysue'y jo uoljeunysy 2 
9% LZ SZ LZ LZ 6% 62 FZ 9% ST LZ 9S ZS GT 6Z FT 9H OF seagaeAnaeee NAOT? AT sourysiqy dey 9 
6I SZ LZ ST 62 8% SE LI GZ 8E EE 9Z 6Z CE 8S FS OT AT ET sisAnouRW Ipy Suruuelg p 
ZF LS ZF GT OS FE E9 Th GZ FS Th 8S OF IS Gh PES 9% ZZ GS 8E A I] woljezienstA jeyeds ¢ 
4 $8 SS ZY IZ LE GE GS LE TS 8b OF O€ OF LE Zh LZ SE ZS GZ ve 6¢ oe eer aan tenons rshea Te aaananaeas selso[euy 2AnSIq z 
& IS 68 22 GE YE GE YE YE 66 GE &% 6G 66 TE 82 62 02 SZ GS GE 6E SADR EARNS ARKR ASSIS OAS Soa a Arowsyy, dey 1 
9% GZ ¥% rd Tz 02 61 8T LT CT rT ae or IT oT 64 5 2 F 2 2 FT PLL 389. ‘ON 
4S9L 











4XLIZe I UOTZe]A10d103,U] 
I ATaVL 





65 


J. P. GUILFORD, BENJAMIN FRUCHTER, AND. WAYNE S. ZIMMERMAN 


“peyqIuio szulod jeuitseg, 
























- 


“““““"COTYBUIPIOOD Xo[dulogy gg 





Rs eight, Suse Se SS gs ee Se se OR OS Wee odie yay doa. WO Nana a a “Buruosvey ojewyyIy Eo _ 
OLD ere RR ae va en eed cae Pe AN el Lal eager oor Mabie. oe genre a eS CT oc II SuoryetedO [eo1zeunN ce 
TLS PA ESS) sett GR SS Re a A lt ar CRM ef Sie eds De ured CE Root ataeW SRL EASE el er iF “JT suorjzeredgO [eoltIeunN FS 
82 Zt LI ST “=-"="""2°"TTOTYBOTTGUBP] JO poedg gc 
ee eR, ier) et CS ae Oe todo 8 Se ah a ee ce ey ee so[dioutig [eotueyooyy ZG 
a, MUM 02) Fore ic ae Rare Sy AE pg, on > ee uoIsudyardu0D JueUINIsUT TG 
8% OF Ih 8S LI 1% & ssossraceseensnosseensceceseunacnaeeneesees FUTIOSROAT 1BIVVAS Zp 
9% LT 90 ZO 12 9&8 TE ST SESE eyes ee UOTVUIASY Isuy Tp 
rr EE cote TN eee Ss a Ee eee eee SOUT[INO pesepnoursy OF 
a tee CM OM eee Zuruuelg eayjmedwuoy 6g 
Be eS MS B&R SS keene a ann ree Rane NNER oii SS uoT}e]UITIO ssedu0y ge 
Fe eR ee ee 1 OG Ont i ate Soy eee ISINOD B Zutuurtg LE 
LN OR GR Ge SR OS SRG Hr dr RUE ATG NA Ny Ft NS a gS AIOUDP[ VUIEN 9Ue[T 9 
Oc. ee ee ee ee eee ee Se OE AIF IIT PULL “OBEY UOTFeUTUILLOSICT Gg 
ee ie ie oe ek ie ee et GEG SOS BEETS IT 9 [ eULL, UONoveY UOIZeUIUILIOSIG: HE 
68 TE 83 Gz LE OF GE FF TS OF 9% LE 9 LT GE TE eB as Surunoy yoolg ¢¢ 
era ot Ge Gb ee or oe 0G thcUr Gel eb or OL Go tN et ARS ee A[quiessy u19}8q ZF 
83 62 9T 80 ZS FE VE FS GE 9B GI LZ 0% TZ 02 0B LE 2 rsrersnaansenceeenensaneensarsunsuecsne a. re 2. 
€%@ 88 08 LZ 22 8I 86 TE ST FE GS GE FP ES SE 6S YE 02 LZ “UOTJVULIOT JYSIET OF 
O€ LO LT ZI 12 IS $6 O& TE SE LZ LS] OF OT L] LS SF TS YE YE ~remtensontotecnese nevacyesen iene sere te 
TZ OF LE SE OF YT LB LE OT TE LB GE VE OZ SE TE 6E BS VE TH 9E sarenecaweranegeneaenarensrs sheets tty oT en ae 
93 9T ST FT Fz GT LZ ST SZ GT ST OF LT OF TS HZ LS EB OF GS GB BS ALOUIB{ UOIZISOg eUeTq LZ 
G8 8% 62 62 LZ OF TE OF BI SE 82 GE SE IT SS GS BE TS SE 9S LS SE GT II UorywoyrUep] yelqQ 9z 
8 93 3 GZ GZ 62 TIS GE OZ GE 6S SE LZ 0B LZ OS LE 9% GE GZ GS EE GB ] UorzeoyTWUapy yelqo Cz 
OF 8Z OT IL 9% Sh TS 8 6E 92 LZ TE 6B BI SS OT GE LT Ih LZ 8S 9% &% “““UOTFOJUSIAO [VLIOY FZ 
38 22 28 IS Zo LI OF OF ST SE BT FF 0B LT GE 9Z TH 2S GE 9S TZ 62 G2 UOIZVJUIIIO UOIZISOG 97 
6% OF 6Z Zs 0% GE SE LE TE SE SE BE VE 8S OZ GI PE GT FE SE 62 TF OT uorpUsoday yefqO TZ 
83 83 82 8% 93 LI SS SF 80 82 VE BE GE BT TE OF SP FS HS 9S BE YE TS rns suruur[g dep 0z 
GS 68 02 ZI Ss 8S Sh LE FH GH TE 82 82 ZE GI ST OF TE OF TE GE OF GE I Uoryezifenst,A [eyeds 6T 
2 G8 12 LE 8 S@ Lo SE GS 9S GS GZ OF SS IT PT SS ES TS OVE LT BS OT UOIZCIYISSB[D 9INSiq YT 
0% ST ST SI Se ST 2 82 GT 92 GI ST 22 2S Te OT TE TS GE TS LE SS GBs ALOWOTA [ENSTA LT 
VS SE LI OT 62 6F SF 8S FE YE FS ST EE ST GS 0S LV &S GF OF FE GE OE UOTZEZI[VNSIA UOIBVULIOT CT 
82 93 9T ST 8% 62 SB 68 IZ 9E TS 00 LE GO SZ 2S SH ZZ 62 6Z BF TH FE rrr SUTBLL OZBI PT 
88 ZZ SI ZI 1% 63 33 SS GT SE FT 80 BT 8ST SS LZ LE HZ 8S GS GZ TSE TS rrr SUB LT, YB ST 
98 92 FI 80 92 98 TE 9S ZZ OF IZ LO-SE ST OF GT GH BT 82 LE SF 9S GE rns yINIIID @ Zutuuelg ZT 
TZ 82 22 GE 93 OS 9F OE GH SF OZ TT GE FT LZ 8S VF HS GG SS HE LE PE SIaANsUB JO UOTZeZITeNSIA TT 
98 PE BI FT OF LE 9S OF 8Z 6E IZ FI 6E FT OF ZE Th 6S GF SE GS BF 8G UOlIZVJUIIIO [BUOTPIAAI, OT 
38 98 93 FZ TE 8S SE 6T TZ 82 9T 90 LI HE LT BT SE 8S GE ES OF OF OS So}enoUlIS oue[_ oy ATOWW 6 
8% ST 80 60 LE 20 IZ 6S ZI SE 0S EF SE 9S BE SE LY SE GT GE GS Sh LT UOI}BoYIWUSpy JO peedg g 
o& FS 8E GE Sh 30 OZ GZ ZO-0E GI S&S LZ GT LZ HS BE SE ET SS TS LS QE yysueT JO uotjeuUNs® 2 
0Z 8T 9% 9% 82 80 SZ 9Z 10-62 FI FS LT OZ SS FS GF YE TT OZ SS OF ST “orrerroreeserereoeeenee-@QUB ISIC. Aeyl 9 
ZZ GT 93 9% BE LZ ZS 8S BI 1% SZ 60 LZ LO OT 80 PE GT 6Z FZ 82 GS BT S1eanousW Ity Sutuueld F 
0Z 92 S&T LO FI 8h FE GE SE Sh SE OF GE LZ OS BI VS GE VF GE Th FH TS II “orzezitenst, Tenedg ¢ 
O€ Z¢ 9% GI LZ 8E TE 8h ZS Ih FE 9E 8H TE VS GT GP OE TH Gb LE OF 82 roeenesonsececevereseoseseeees SOIBO[BUY SANS Z 
TE OS FE SZ 9B ST 8Z BE 0S BS ES GE FE YE ES BS BE GS GE OF 9B OE OW re gia tsi “Alowey dep T 
89 6G G¢ FE ES Zo TS Sb IF OF GE 8E LE 9S GE PE SE SE TE OF 63 82 LZ STILL 389, ‘ON 
489, 








«XII}OY UOTZe]P100.103uT 
(penuyzu0D) T AIAVL 





66 PSYCHOMETRIKA 











TABLE 2 
Multiple-Group (I through VII) and Centroid Loadings (VIII through XIII)* 
Test 
No. I W IW IvV-V -VI Vi VII IX XX XI XH XI #& 
~~ 1 26 38 28 O8 20 38 12 04 05 16—10 —15—17 51 
2. 37 29 32 39 30 10—06 05 05 18 O02 OT —05 62 
3. 38 36 34 55 11 05 00 07 —02 21 04 08 —08 17 
4. 12 18 26 28 20—03 01 ig 10 —08 —04 06 28 
6. 3 400 05—03 04 —02 O1 10 —25 07 —29 18—14 48 
a6 3 34 —04 —09 14 —01 01 15 —23 09—09 08 —04 35 
8. 48 51 —15 —08 27 —08 —05 20 —i2 —05 —07 07 —04 = =—«67 
9. 08 58 29 06 —02 29 08 —10 23 —28 —08 O07 09 =«67 
10. 31 39 8631 24 10 —04 12 22 18 —08 07 11 —08 52 
il. 20 41 58 14 02—05 14 17 #19 «05 16 24 10 += «75 
12. 14 37 35 25 24 —04 18 15 12 —28 —12 —08 —16 56 
18. 16 28 21 27 08 O07 O9 28 —10 —15—10 12—05 38 


14, 18 30 24 28 29-02 08 40 08 —10 —14-—12—18 59 
15. 17 40 54 27 09-09 O09 O09 17-05 10 04 O02 64 
vA 20 38 O09 08 16 2 04 10 24 08 —12—12 08 38 
18. 146 23 26 16 40 —01 6 —20—08 14 08 22-—16 48 
19. 20 46 38 48 08 O06 03 —02 —02 04 —08 —04—04 65 
20. 20 «35 3 11 33 05 17 21 —18 —05 —03 —21—15 47 
21. 32 21 36 34 #12 ##OF 14—02—03 10 138 OT -—08 47 
23. 39 22 24—03—01 05 28 10 —20 —12 3 —06 18 44 
24, 17 36 67 —04 08 —03 05 05 OF OF O02 08 —02 68 
25. 31 29 25 18 OF O08 58 —04 —04 —02—05 05 05 64 
26. 36 «31 220 «623806~«6(08)— (05S 50 —05 —09 03 —09 —06 —05 61 
27. 18 3 17—04 12 56 O01 14 18-15 12 06 22 = 62 
28. 44 33 10 15 26 O08 04 i1 O8—08 15 O7—19 49 
29. 18 27 22 25 24-01 00 30 —05 —05 3—14 O07 42 
30. 35 238 20 O07 52 —02 —05 —07 —04—138 13—07 05 55 
31. 15 46 31 19 O08 09 10 04 19 05 24—10 11 50 
32. 16 50-—08 17 14 OO 08 O08 —08 22-05 18 16 46 
33. 33 48 22 16 25-06 038 31—14 10—08 —09 15 64 
34, 338 27 02—06 18 14 O06 22—24-—20 23 04—038 44 
35. 3380 24.0 10 —06 23 —01 08 28 —16—15 22 —05 —07 42 


40. 24 35 16 31 22-—02 11 08 —i17 —11—05 16 08 44 
41. 0 32 388 24—15 08 —06 06--11 10 22—15 17 46 
42. 49 11 26 17 42—07 16 05 —03 —05 —10 —03 02 57 
51. 24 36 49 06—08 03 00 —05 —06 —06 04 —02 —04 46 
52. 04 24 52 41-07 O06 O08 O02 02 O05 08 O05 io . 62 
53. 20 638 O00 00 00 00 02 O01 00 O01 00 OO—O1 44 
54. 80 —02 —03 —05 00 02 06 02 —08 —05 —03 05 —02 65 
55. 81 02 03 05 00 —02 —04 —03 —03 —03 06 —06 —10 69 
59. 56 O01 29 34 17 00 —12 2 04 08 10 06—08 58 
68. 17 38 35 04 O2—07 14 14 —14 —08 —02 05 03 88 


*Decimal points omitted. 

















J. P. GUILFORD, BENJAMIN FRUCHTER, AND WAYNE S. ZIMMERMAN 


Rotated (Orthogonal) Factor Loadings (Solution I)* 


TABLE 38 








Test 


VI VII Vill IX 


XR Ann 





31, 04 25 34 34 «11 
32. 00 46—05 23 12 
33. 15 38 11 25 = 16 
34, 27 18 O01—03 05 
35. 26 19 11—04 11 
36. 09 O05 O04 O02 17 
37. 29 14 16—03 42 
38. 382 09 O08 01 24 
39. 23 11 10 12 = 22 
40. 11 22-05 30 28 
41. —01 04 24 50 00 
42. 384 09 04 O06 38 
51. 146 08 40 30 11 
52. —06—06 26 58 20 
53. 11 538 16 14 = 00 
54, 77,04 02—04 08 
55. 80Y 08 O01 O07 16 
59. 45—04 07 30 48 
63. 06 14 27 28 ~~ «02 


146 138 18 = «650 


—03 04 06 12 


24 04 
16 =610 
19 09 
08 07 
56 02 
39 «15 
31 24 
05 = 01 
09 826 
14 20 
04 17 
25 = =28 
12 30 
00 14 
06 00 
16 04 
16 —04 
20 30 
11 11 
24 20 
22 06 
19 07 
21 +01 
07 30 
03 26 
038 = 37 
00 20 
—10 14 
28 06 
32 24 
16 50 
11 46 
26 = =«05 
06 16 
17 09 
09 08 
24 19 
04 = «11 
15 «11 
19 04 
06 06 
16 04 
21 05 
09 02 
05 06 
30 21 


19 
03 


67 


> 02 
14 


h2 
51 
62 
16 ~ 
27 
49 
35 
67 
67 
52 
fi a 
56 
37 
58 
64 
37 
49 
65 
47 
47 
44 
62 
65 
61 
61 
51 
41 
54 
49 
46 
63 
44 
41 
44 
46 
T4— 
27 
44 
47 
57 
46 
52 
438 
67 
70- 
58 
38 





*Decimal points omitted. 

















68 PSYCHOMETRIKA 
TABLE 4 
Rotated (Orthogonal) Factor Loadings (Solution II)* 
Test 
No. I II III IV V Wi: ‘Vil Wit ix xX XI XII 
i, 03 00 14 13 04 26 10 18 43 27 00 26 
ms 20 15 16 41 36 06 00 13 37 25 05 14 
3. 14 15 18 60 26 —01 08 04 3 23 07 04 
4. 01 07 14 26 17 —04 01 28 24 12 —05 05 
6. 06 41 22 07 00 08 12 06 —03 42 05 15 
i 08 32 09 —03 01 00 07 12 05 36 21 16 
8. hg 57 00 —05 07 05 06 25 11 26 29 23 
9. —07 26 26 20 06 59 15 09 29 —11 03 11 
10. 18 25 27 30 21 11 17 29 27 04 17 —04 
fe 12 14 58 26 28 12 12 18 37 03 17 —11 
12. —01 15 21 32 19 16 24 43 18 —05 02 17 
3. 05 17 15 34 11 “f 17 29 00 19 14 —03 
14. 02 16 09 33 3 06 11 57 19 3 05 08 
15. 07 11 44 40 25 10 11 Zo 33 —09 10 05 
17. 02 19 01 11 01 23 04 24 43 12 —05 12 
18. 01 12 14 15 51 00 08 =—08 20 17 11 24 
19. 02 19 23 60 15 12 14 10 29 12 02 19 
20. —04 09 3 18 05 04 20 38 16 19 23 32 
+ 22 01 21 if 27 04 18 01 28 14 12 09 
23. 24 oO 26 07 —04 11 32 16 06 15 24 17 
24. 01 01 62 17 19 3 01 18 29 06 06 16 
25. 13 05 22 17 3 13 62 09 25 14 07 12 
26. 16 08 20 3 3 06 57 07 25 17 03 18 
27. 09 01 02 05 09 9 00 16 32 13 24 02 
28. 29 26 5} 15 25 2 09 20 28 08 25 10 
29. 06 06 07 33 138 —01 02 38 19 08 27 at 
3 24 19 09 06 59 05 00 16 15 07 20 09 


70 20 02 —09 04 02 14 04 3 24 08 11 

3 20 02 05 03 —03 05 02 14 14 09 17 
51 07 10 33 85 —04 —05 03 21 14 02 10 
00 16 40 20 06 10 18 19 07 11 20 11 





*Decimal points omitted. 














PSYCHOMETRIKA—VOL. 17, NO. 1 
MARCH, 1952 


THE EFFECT OF DIFFICULTY AND CHANCE SUCCESS ON 
ITEM-TEST CORRELATION AND ON TEST RELIABILITY* 


LYNNETTE B. PLUMLEE 
EDUCATIONAL TESTING SERVICE 


An equation is derived for predicting the effect of chance suc- 
cess, relative to item difficulty, on item-test correlation. The values 
predicted by this equation and by equations derived by Guilford and 
Carroll for predicting the effect of chance success on item difficulty 
and test reliability are compared with empirical values in an experi- 
ment which used identical test items in multiple-choice and answer- 
only form. 


Introduction 


The “multiple-choice” type of test, in which answer options are 
supplied, frequently has been objected to on the grounds that an ex- 
aminee who does not know the answer to any of the items in a test can 
make a substantial score by pure guesswork and that his score there- 
fore does not represent his true knowledge. In support of the multiple- 
choice test, on the other hand, it has been argued that this type of test 
can be made both a more effective and a more efficient measuring in- 
strument than the “answer-only” type of test, in which no answer op- 
tions are given and which thus require the examinee to supply his 
own answers. Since wrong answer options can be restricted to an- 
swers resulting from certain types of errors, the multiple-choice item 
can be directed towards testing specific types of errors. The multiple- 
choice test places the burden of decision regarding the correctness 
of an answer on the examinee rather than on the scorer, and greater 
consistency in scoring is thus possible in this type of test than in the 
“answer-only” test. 

One aspect of the problem is investigated in this paper: What is 


*Condensation of a dissertation presented in partial fulfillment of the re- 
quirements for the Ph.D. degree to the University of Chicago. Grateful acknowl- 
edgment is made to Professor Harold Gulliksen for his guidance as thesis ad- 
visor and to Professor L. L. Thurstone and Dr. D. W. Fiske of the University 
of Chicago who served as members of the thesis committee. The author is also 
indebted to Professor S. S. Wilks for review of the derivations and development 
of statistical tests used in the thesis, to Dr. L. R Tucker for technical advice, 
and to Dr. W. G. Mollenkopf for critical comments on the derivations and in- 
terpretations, The writer expresses appreciation to the Educational] Testing Serv- 
ice for making available its technical facilities, and to the University of Chi- 
cago for the flexible administrative arrangement which made this thesis possible. 


69 








‘ 4 
‘ 
pg ag ae ‘ 
ise », 
iit a, | reer 
bn 


70 PSYCHOMETRIKA 


the theoretical effect of chance success on item-test correlation and 
on test reliability, and to what extent is this theoretical expectation 
borne out in practice? 

Several previous investigations (1-13, 15) have been concerned 
with the prediction of the effect of chance success on item difficulty, 
inter-item correlation, item-test correlation, and test reliability. 

In 1936, Guilford (3) presented a formula for the propor- 
tion of examinees who know the answer to an item, .p, as a function 
of the proportion of examinees who answer the item correctly in mul- 
tiple-choice form, p, and the number of answer options, n: 





np 1 
eek eee (1) 
This formula assumes that all answer options are equally attractive 
to the examinee who does yyt know the correct answer. 

Carroll (1) considered the combined effect of item difficulty and 
chance success on the Pearsonian correlation between items or be- 
tween sets of items. Using the binomial distribution of chance fail- 
ure (number-not-right) scores, he arrived at the following formulas 
for mean, standard deviation, inter-test correlation, and inter-item 
correlation, respectively: 





E'=WE, (2) 
on = Won? + RWE, (3) 
Wrz, CL CL, 
greg: = . and (4) 








v2 


4) W? ox? on? + RWE, 0:2 + RWE; of? + R°E,E, 


Wa: (1 itis q2) 
= , (5) 
V 4:92 (1 — Wa,) (1 — Wage) 

where E represents the failure score, W is the probability of failure 
and R is the probability of success in multiple-choice form, a prime 
denotes a multiple-choice test, absence of a prime denotes an answer- 
only test, subscripts 1 and 2 denote two different items or tests, and 
q is the probability of failing a given item in answer-only form. 
(Notations used by the present author have been substituted for 
some of those used by Carroll.) 

It is the plan of the present paper to develop the equation for 





T1"2" 























LYNNETTE B. PLUMLEE 71 


predicting the effect of chance success on item-test correlation (bi- 
serial coefficient of correlation) and to compare the values predicted 
by this equation and equations (1) and (4) with values actually ob- 
tained in an experiment which used identical test items in multiple- 
choice and answer-only form. 


The Prediction Equation 

In deriving the equation, the following assumptions will be 
made: 

1. that every examinee attempts every item in the test, 

2. that every examinee who knows the correct answer to an 
item answers the item correctly in both multiple-choice and answer- 
only form, 

3. that every examinee who does not know the correct answer 
to an item answers the item incorrectly in answer-only form and 
chooses from among the options on a basis of chance alone in mul- 
tiple-choice form, and 

4. that the number of options per item is the same for all items 
in the multiple-choice form of the test. 


Although the first three assumptions are rarely if ever borne 
out in an actual test situation, they are necessary in deriving relation- 
ships between a hypothetical test on which results are not influenced 
by chance success and one in which chance success is fully operative. 

The following notations will be used: 


Multiple-Choice Answer-Only 
Form of Test Form of Test 


R Probability of answering an item 
correctly on the basis of chance 
alone. 

Ww Probability of answering an item 


incorrectly on the basis of chance 
alone, where W =1—R. 


h h Number of items in the test. 


x’ z Score (number of items correct) 
on the test. 


x’: Ux Seore of kth individual. 








72 


Multiple-Choice 
Form of Test 


, 


Qkj 


te. 


’ 
Tries 


and 





PSYCHOMETRIKA 


Answer-Only 
Form of Test 


kj 


Tris 


Trois — 


M,.— M,- 


9 aa 
Ts — 


M.—M, ~p 





Score of kth individual on jth item, 
where a = 0 or 1 depending on 
whether the item is answered in- 
correctly or correctly. 


Number of individuals who take 
the test. 


Number of individuals who an- 
swer a given item correctly. 


Number of individuals who an- 
swer a given item incorrectly. 


Proportion of all examinees who 
answer a given item correctly. 


Mean score on the test. 


Mean score of those individuals 
who answer a given item correctly. 


Mean score of those individuals 
who answer a given item incor- 
rectly. 


Standard deviation of scores on 
the test. 


Ordinate of the normal probabil- 
ity curve at a point to the left of 
which lies a proportion p’ or p of 
the area under the curve. 


Biserial correlation between a giv- 
en item and total test, where 


(6) 


Cr z 


Ss 


(7) 





x, | 


Cz’ 














LYNNETTE B. PLUMLEE 73 


An expression can be found for 7. in terms of 7. if we know 
the values of M,., M,., o, , and p’ in terms of the corresponding 


answer-only statistics. 
Guilford expressed p in terms of p’ as shown in equation (1). An 
equivalent formula for p’ in terms of p is 


p—Wp-+k. (8) 


Carroll’s formulas for the mean, (2), and standard deviation, 
(3), can be expressed in terms of number-right scores as follows: 


M, = WM, + Rh, and (9) 
or = VW? o,? + RW(h—M,). (10) 





It will be noted that as o,? becomes very large relative to h — M, 
or as W approaches 1 
lim = w. (11) 
Cz 
If, for an answer-only form of a standardized test, the mean is equal 
to about one-half the range and the standard deviation is equal to 
about one-fifth the range, where the range approximates the total 
number of items, then the standard deviation of the multiple-choice 
form of the test, o,, will theoretically approach Wo, as the number 
of items becomes large or as the number of answer options becomes 
large. 
To find M.. in (7), we note that the sum of answer-only scores 
of those ¢. individuals who answer item 7 correctly in answer-only 


form is 
t 


D LAxj. (12) 
kr1 
From (9) it will be seen that 
t t 
2,.=W>d x + Rht. (13) 
k=1 


k=1 


Therefore, the sum of multiple-choice scores of the t, individuals is 
t 

W hs LA; + Rht.. (14) 
k=1 


The sum of answer-only scores of those ¢t. individuals who answer 
item 7 incorrectly in answer-only form is 











74 PSYCHOMETRIKA 


t 
> ry (1 — a;). (15) 
k=1 
Of these ¢_ individuals, a fraction R , selected at random, will answer 
item j correctly in multiple-choice form, and the sum of multiple- 
choice scores for the latter group will be 
t 
RW> Ly (1 — ay;) + Rtht_. (16) 
k1 
Therefore, the sum of multiple-choice scores for all persons who an- 
swer item j correctly in multiple-choice form will be the sum of (14) 


and (16): 


t é t 
D> x0; = WD rx, + RW Dd x 
ke k=1 


k=1 
t (17) 
— RW D> x; + Rht, + Rht-. 
k=1 
Since 
t.=t—t, and W=1-R, 
t ‘ t 
DS 2x43 = W? d ax; + RWS x + RWht, + Reht. (18) 
1 1 k=1 
Since 
t t 
> 2x3 =t.M, and S7%—tM,, 
1 k=1 
then 


Dd 2.04; = W*t.M, + RWtM, + RWht, + Rht. (19) 
k1 
Dividing both sides of (19) by t,, and factoring out ¢ in the right- 
hand side of the equation, we obtain 


M,. = 
t 





(W*pM, + RWM, + RWhp + Rth). (20) 


+" 


Noting from (9) and (8) that 


M,. == (WM, + Rh) =5 (Wp + R) (WM, + Rh) 
(21) 


=< (W*pM, + RWM, + RWhp + R*h) 














=~) 


ol 


LYNNETTE B. PLUMLEE 


and that 


we have 
1 c 1 
M,—M, = rr (W2pM. — W2pM,) = ¥ [W*p(M.— M,)]. (22) 


Since from (6) 


M,—M,= = (23) 
Dp 
z 

Tris = Ww? ee, ga, (24) 
Og? &@ 


Since the relationship z/z’ is constant for any given values of p 
and Rk, this ratio may be easily obtained from a table or graph pre- 
pared for such a purpose. F. M. Lord has shown that, as p increases 
from 0 to 1, z/z’ increases with decreasing acceleration from 0 to a 
limit, 1/W . 

It will be noted that for the conditions under which o,- approach- 
es Wa; 

: ; z 
lim Tis = W- ie Paige (25) 


Also, as p approaches 1, under the conditions for (25), 7.:i.. approach- 
eS Mois - 

Carroll’s formula for the reliability of a test in multiple-choice 
form, (4), may be expressed in terms of number-right scores as fol- 
lows: 


Wr, Cz G. 
le bf anid 





(26) 





Tare, = V Wo," + k(h— Mz.) V Woz? La K(h—M, ) é 


It will be noted that the reliability of the multiple-choice form of a 
given test will theoretically always be less than the reliability of the 
answer-only form of the same test. However, under the conditions 
for which o,- approaches Wo,, the multiple-choice reliability will 
theoretically approach the answer-only reliability as a limit. 








76 PSYCHOMETRIKA 


Comparison of Observed Data and Predicted Values 
In order to determine the extent to which the predicted effects 
of chance success are obtained in practice, a series of mathematics 
tests was designed, employing the same items in answer-only and 
multiple-choice form. 


The Tests 


Four sections of 36 items each were prepared; all sections were 
planned to be parallel in difficulty, discriminative power, solution 
time, and subject matter. Estimates of item difficulty and discrimina- 
tive power were obtained from statistics from previous uses of the 
items in answer-only form. Estimates of subject-matter equivalence 
and time required for solution were subjective. Equivalence of sub- 
ject matter, which included algebra, geometry, and trigonometry, was 
considered rather broadly. 


Each of the four sections was prepared both in multiple-choice 
form (with five answer options) and in answer-only form. Two al- 
ternative methods were considered in selecting answer options. 
Either the true chance situation might be approached as nearly as 
possible by selecting answer options within close range of or very 
similar to the correct answer, or the practical testing situation might 
be approached and answer options be selected on the basis of their 
appeal to the examinee who does not know the correct answer. Since 
the latter approach was felt to be more meaningful for the purposes 
of test construction, options were selected by the usual technique, 
which includes using answers reached by popular wrong methods of 
solution. In some instances, tallies of the frequencies of actual an- 
swers given by examinees to an item in answer-only form were used 
in selecting answer options for the multiple-choice form of the item. 


The four 36-item sections in multiple-choice form may be desig- 
nated as M1, M2, M3, and M4, and the same sections in answer-only 
form as Al, A2, A3, and A4, respectively. For purposes of analysis, 
the items in sections M1, M2, Al, and A2 will be referred to as Set 
I items; those in sections M3, M4, A3, and A4, as Set II items. 


An additional set of 16 mathematics items, alternating multiple- 
choice and answer-only, was administered to all examinees as a basis 
for checking the equivalence of the population groups. 


Four tests were then arranged as follows: 








LYNNETTE B. PLUMLEE 





| k a Test W Test X Test Y Test Z 
Part I Set of 16 items, common to all tests, with even-numbered items in 
multiple-choice form and odd-numbered items in answer-only form. 

Part II Item Set I Item Set II Item Set II Item Set I 

him fs_8. aide Section M1 Section A3 Section M3 Section Al 

Part III Item Set I Item Set II Item Set II Item Set I 
Section M2 Section A4 Section M4 Section A2 

Part IV Item Set II Item Set I Item Set I Item Set II 
Section A3 Section M1 Section Al Section M3 

Part V Item Set II Item Set I Item Set I Item Set II 

. Section A4 Section M2 Section A2 Section M4 


In order to determine how many examinees actually looked at all 
items, an easy item, which it was expected would be tried by virtually 
all of those examinees who reached it, was placed at the end of each 
section. Fifteen minutes working time was allowed for Part I and 
thirty-five minutes for each of the other parts. 


The Sample of Individuals Tested 

The tests were administered to a sample of approximately 560 
male examinees of college freshman level or higher. The four test 
booklets were distributed in successive order. The analysis was based 
on 138 cases for Test W and‘on 1389 cases for each of the other tests. 


Scoring 

When each test was scored, “total-number-correct” scores were 
obtained for each part separately, for Parts II and III combined, and 
for Parts IV and V combined. Thus, for each test and each examinee 
there were a score on the part common to all four tests, a score on 
the multiple-choice items, and a score on the answer-only items.* The 
easy last item in each section was not scored or included in the fur- 
ther analysis. 


Item Analysis 
For each item in Set I and Set II the proportion of examinees 


*Item 17 in Set I proved to have two defensible answers among the answer 
options in multiple-choice form. In scoring and in subsequent analyses, both an- 
swers were considered correct, but the item was treated as a five-choice item for 
prediction purposes. The obtained proportion correct in multiple-choice form was 
.88 for Part II and .83 for Part IV; in answer-only form, .68 and .76. The two 
obtained biserial correlations in multiple-choice form were .55 and .88; in answer- 
only form, .51 and .63. The item’s average multiple-choice difficulty approached 
that predicted for a two-choice item, but its average biserial was more nearly 
that predicted for a five-choice item. 











78 PSYCHOMETRIKA 


that marked the correct answer choice was computed. This propor- 
tion had as its base the number of examinees who answered the giv- 
en item or a subsequent item in the same part. Since the number of 
candidates who completed the scored items in the different parts 
varied from 32 to 109, the aim of a true power test was not met, and 
many of the proportion-correct figures are not based on the total 
number of examinees. For each answer-only item, the predicted mul- 
tiple-choice proportion correct was computed from the answer-only 
proportion correct, using equation (8). 

Also, for each item in Sets I and II, the biserial coefficient of 
correlation was computed, with the total score on the item set of 
which the item was a part as the criterion. Hence, multiple-choice 
items were analyzed against the multiple-choice score as the criterion, 
and answer-only items were analyzed against the answer-only score 
as the criterion. For each answer-only item the predicted multiple- 
choice biserial coefficient of correlation was computed from the an- 
swer-only biserial coefficient, using equation (24). 


Analysis of Data 

In order to determine whether the item analysis and reliability 
data for multiple-choice differed from that predicted for multiple- 
choice from answer-only more than could be accounted for by chance, 
certain comparisons of observed values with the corresponding pre- 
dicted values were made. In the presentation of these comparisons, 


aA 


p = the observed sample value of the answer-only proportion 


correct, 

” = the observed sample value of the multiple-choice propor- 
tion correct, 

: = the observed sample value of the answer-only biserial, 

r = the observed sample value of the multiple-choice biserial, 


p’ = the value of the multiple-choice proportion correct pre- 
dicted by equation (8) from the observed answer-only 
proportion correct, and 


~ 


r' =the value of the multiple-choice biserial predicted by 
equation (24) from the observed answer-only biserial. 














LYNNETTE B. PLUMLEE 79 


The subscript 1 indicates that the item appeared in Part II or III, the 
subscript 2, that it appeared in Part IV or V. 


Group equivalence. To determine whether the item statistics and 
reliability values can be compared directly from test to test, the four 
groups were compared on the basis of their scores on Part I. 

An analysis of variance of the differences among the means of 
the various groups indicated that they were not significantly differ- 
ent. (F = 1.3, d.f. = 3 and 551, p > 5%.) 

Item difficulty. The regression line of p’ on p was compared with 
the theoretical relationship between multiple-choice and answer-only 
proportion correct for five-choice items: 


p= .8pt+.2. (27) 


To test whether the regression parameters of the observed diffi- 
culty values differed from the theoretical parameters, given by equa- 
tion (27), more than would be expected by chance, the 95% confi- 
dence limits of the true regression parameters were computed for 
each regression line. (See Table 1.) The significance tests used are 
those described by Wilks in (16) for the regression coefficient and 
in (17) for the regression intercept. The hypothesis that the theoreti- 
cal parameters fall within the 95% confidence limits of the true para- 
meter values is supported for seven of the eight regression lines. In 


the case of the regression of p’, on p, for Item Set 1, the parameters 
are outside the 95% confidence range in the direction of the para- 
meters of the relationship p’ = p. However, since the tendency of 
the obtained Set I regression coefficients to exceed the theoretical co- 
efficients is offset by the tendency of the obtained Set II regression 
coefficients to fall below the theoretical, there seems to be no basis for 
attaching significance to the direction of the difference. No variation 
in the test content or administration was found which would account 
for the noted difference between Item Sets, although the probability 
of this difference occurring by chance (according to a simple signs 
test) would be only .008. 

As an additional check on the relative difficulty of items in the 
two answer forms, the means and standard deviations of the pro- 
portion correct, observed and predicted by equation (8), are shown 


in Table 2. It will be noted that D's means for both sets correspond 


A A 


closely to »’, and p’, means, but that p’, values are consistently higher. 











80 PSYCHOMETRIKA 














TABLE 1* 
Regression of p’ on p 
Theoret- 
ical Obtained Values 
A A A aA A A A A 
Values pon p, D's ON Po P'2 ON Py p', ON py 


Stemn Set I 


Item Set II 


Regression coefficient 


and 95% confidence limits .80 .807(+.094) .866(+.064)  .836(+.089) .820(+.082) 


Regression intercept and 


95% confidence limits 20 =©.201(+.051) .124(+.088) .181(+.048) .154(+.048) 


Correlation coefficient .895 -953 .910 .920 


Regression coefficient 





and 95% confidence limits .80 .769(+.099) .765(+.077) .780(+.075) .7388(+.106) 


Regression intercept and 








95% confidence limits 20 §=©.224(+.055) .193(+.046) .222(+.042) .204(+.064) 
Correlation coefficient .876 919 .924 852 
A Aw 
*~p; and ?’; == observed answer-only and multiple-choice proportions correct, respectively, for Parts 
II and ITI. 
A Aw 
Pe and Pp’: == observed answer-only and multiple-choice proportions correct, respectively, for Parts 
IV and V. 
TABLE 2* 
Comparison of Item Difficulty Means and Standard Deviations 
= — “4 ee ee 
4 4 , U , U 
Statistic PD, Do P's ms ”', D'o 
Item Set I 
Mean 473 5238 .583 577 .580 .619 
a .256 .259 .231 .285 .207 .208 
Item Set IT 
Mean .494 542 .608 .607 597 .632 
o .259 .265 .228 .219 -205 .205 
A A 
*p; and Pp’; == observed answer-only and multiple-choice proportions correct, respectively, for Parts 
II and It. 
A A 
P2 and 9: = observed answer-only and multiple-choice proportions correct, respectively, for Parts 
IV and V. 
~ 
D's == multiple-choice proportion correct predicted by equation (8) from observed answer- 
only proportion correct for Parts II and III. 
D2 == multiple-choice proportion correct predicted by equation (8) from observed answer- 


only proportion correct for Parts IV and V. 


This difference, which refiects that between p. and p,, may indicate 
the influence of a practice effect. However, the average observed mul- 





TABLE 2* 





wr 


o> 


LYNNETTE B, PLUMLEE 





“A Weg AO AT WV AOZ [Vlsastq AjuO-1oOMSUB PadAresqo WIOdj (PT) UOTZENbDS Aq pajzoIpeid [BVLIesIq soJoYyo-9jdIzj;nW == 




















== ty 
‘III Wed 10 I] We A0f pwrsssiq Ajuo-1ansuv poAtosqo wots (p~Z) UOTZeNba Aq poazoIpetd [eLIesIq sd10Yyd-s[dIz[NUWW == an 
‘A pue AI S}uvd doy ‘A[VALPVaCSOL ‘STBIeSIq Bsd1OYD-a]dI4[NUL pUB A[UO-1IMSUB PaAdesqo == 7,4 PUB %y 
‘ILL Pue I] s}teq doy ‘A[@ATPVadSeA ‘S[BLIAsSIq Bd1OYI-a[dyjNUl PUB A[UO-JaMSUB PaAtasqo == “9! pur ta 
oo sr 98° ¢ 90° a g0° or ov" 8e SON[VA UosA\}oq UOTZBlIILO 
960-80" T&0° 420° PSE eae = SOG aE 0F0" 890° (puooes SNUTUT 4S.1Y ) 
SON[VA JO SUBOUL WOBM eq sIUSTOYI 
606° 99T° oLT ost GLo LTs VSG Vase 8LT 83t UISIIo Ysnoiyy 
odojs oP JO OUI[ WOAF 9}VUITISS JO o 
gt’ 6ST’ 4S1Y WOLF oNn[VA PuddsS yo 189, 
sor ost 8st gor oLT LvT Sr oLT svt 9st PUuodesS WAT SNISA E1Y 
IT 9S UWe9z] 
os Lg" og ov 63° sv" 8st 9¢° TS" ss" SON[VA Usa Joq UOTPE[ILIO) 
0g0° T80° geo" 910° LLO- €&90°- 3dL0- 890° 610° $00" (puooes snutul 4s.1y) 
SON[VA FO SUBIUL UBIAJod sIUBTIYI(T 
Sct o9T tsv Ist" G06" oLT a 68T° 8st" ost’ ULSLIO YSNoIYy 
: adols ,GyP FO JUL] UOT 9} eUII}Se JO *o 
oct" vst’ 3S1IY WOT oN[VA PpuodEs 0 5 
ix 2 mr ow: or st or 2 oor SBT" puosas wory anqea ysy 4 
13°S W994] 
Ss ty oo ts 4 "4 *4 " " ea 
pue pue pue pue pues pue pue pue pue pue 
ts os a” V4 2 7A °4 Pr ®s oe 
v v v v v v v v v v 
SONn[BVA Poporpotd ULIOT JOMSUY JUSLOLIC UNIO, TAMSUY sUIeg 
Y}IM SONn[VA PsA.1IsqoO ‘sonTe A pearasqg ‘son[@ A PeA.tesqg 




















wiog A[UQ-1eMsuUy pues 
ad10YyO-a[dI}[ NJ. Ul SUI9z] TOF san[VA UOIVeleII0D [VlIesIg JO uOsIIeduUIOD 
«8 GTAVL 











82 PSYCHOMETRIKA 


tiple-choice proportion correct does not seem to be increased by prac- 
tice. The variance of the observed multiple-choice proportion correct 
values lies consistently between the variance of the predicted multiple- 
choice proportion correct and the observed answer-only proportion 
correct. 


A 


Item-test correlation. The correlations between 7’ and 7’ were suf- 
ficiently low to rule out a satisfactory comparison of the obtained re- 
gression coefficients and the coefficients of the theoretical relation- 
ship by the method used in analyzing item difficulty. In order to de- 
termine whether these low correlations between the observed and 
predicted biserials represented a significantly low relationship or 
whether they were as high as could be expected from the extent of 
agreement between two sets of observed biserials for the same items 


aA 
in the same answer form, correlations were obtained between 7. and 


r, and between 7’, and 7’, . 

In order to determine further whether the values obtained by the 
application of equation (24) to the observed answer-only values were 
in closer agreement with the observed multiple-choice biserials than 
were the observed answer-only values, correlations were also obtained 


between 7 and 7. 
Since there were too few extreme values of the biserial coeffici- 


ent to stabilize regression lines, a direct comparison of correlation 


TABLE 4* 
Reliability of Tests 














Statistic Test W TestX Test Y TestZ 
Number of cases 138 139 139 139 
Mean score 

Observed answer-only 37.7 31.6 36.2 81.1 


Observed multiple-choice 37.8 39.8 39.7 42.2 





Standard deviation of scores 








Observed answer-only 11.3 10.6 10.9 10.6 
Observed multiple-choice 10.2 10.4 9.6 10.3 
Reliability 
Observed answer-only* 84 .80 85 81 
Observed Multiple-choice* 81 -76 72 81 
Predicted multiple-choice; 75 .69 75 69 
*Correlation between two parallel separately timed parts. 
+Correlation predicted by equation (26) from the observed answer-only 


correlation between parts. 











LYNNETTE B. PLUMLEE 85 


coefficients was not considered meaningful, and hence the following 
standard errors of estimate were computed: 


1. the usual standard error of estimate from the best-fit regres- 


sion line, and 
2. the standard error of estimate from a line through the origin 


with a 45° slope, Y ox? + oy? — 2rxyoxey + (X — Y)2, in order to test 
the hypothesis that the compared values are truly equal. Table 3 
shows these errors of estimate, the correlations, and the differences 
between mean values. 

The values in Table 3 seem to indicate that the lack of agreement 
between observed and predicted values is no greater than that be- 
tween two sets of observed values for items in the same answer form. 
It will be noted, however, that the standard error of estimate where 
observed and predicted multiple-choice biserials are being compared 
is consistently lower than the corresponding standard error of esti- 
mate where observed multiple-choice values are being compared with 
observed answer-only values. This would indicate that the values ob- 
tained by the application of equation (24) to the observed answer- 
only biserials are more in agreement with the observed multiple-choice 
values than are the observed answer-only values. 

From the differences between mean values, it is seen that the ob- 
served answer-only biserials are greater than the observed multiple- 
choice biserials for the same items by an average of .08 point, but 
that the predicted multiple-choice biserials are an average of .05 
point lower than the observed multiple-choice biserials. It would ap- 
pear, therefore, that on the average equation (24) over-corrects. 


Test-reliability. The reliability values reported in Table 4 for 
observed answer-only and multiple-choice scores are the correlations 
between two separately timed parts, Part II and Part III or Part IV 
and Part V. The predicted multiple-choice reliabilities were computed 
from the observed answer-only reliabilities using equation (26). It 
will be noted, then, that each of the reported reliability values is for 
a test of 36 items. 

From Table 4 it is seen that the observed multiple-choice reli- 
ability values tend to fall between the observed answer-only and pre- 
dicted multiple-choice values. In order to test whether the differences 
among the observed reliability figures were significant, each reli- 
ability coefficient was transformed to a Fisher z-value. A chi-square 
test (14) was then applied to the eight values for observed multiple- 
choice and answer-only. Since the obtained chi-square for the eight 











84 PSYCHOMETRIKA 


values was 13.6 and the probability under the null hypothesis is .05 
that, for seven degrees of freedom, chi-square will exceed 14.1, it was 
concluded that the differences among the observed multiple-choice 
and answer-only reliability values are not significant. 

To test whether or not the differences between observed and pre- 
dicted multiple-choice reliabilities were significant, the 95% confi- 
dence limits of the true multiple-choice Fisher z-values were com- 
puted. When predicted and observed multiple-choice reliabilities are 
compared tor the same group but different item sets, the hypothesis 
that the predicted multiple-choice z-value falls within the 95% con- 
fidence limits of the true Fisher z-value is supported in three out of 
four cases. When the set of items is held constant but the groups dif- 
fer, the predicted multiple-choice z-value is within the 95% confidence 
limits of the true multiple-choice Fisher z-value in six out of eight 
cases. 

In the absence of any clear-cut evidence to the contrary, it may 
be concluded that the observed multiple-choice reliability appears to 
lie between the observed answer-only value and the multiple-choice 
value which is predicted from the observed answer-only value by equa- 
tion (26). The observed multiple-choice value does not differ con- 
sistently from either the observed answer-only value or the predicted 
multiple-choice value. 


Discussion 


Although, in most instances, the differences between observed 
and predicted multiple-choice statistics do not seem to be significant, 
there appears to be a fairly consistent tendency for the differences, 
slight though they may be, to be in the direction of the answer-only 
statistics, away from the predicted multiple-choice statistics. 

Some explanation for this discrepancy between the theoretical 
and observed statistics may be found by examining the extent to 
which the first three assumptions were met. 

As indicated in the description of the item analysis, although 
the aim was to set a test such that nearly every examinee would at- 
tempt every item, this aim was not met. To test whether “drop-out” 
as such influenced the results, the last twelve items in each part were 
plotted in a distinctive color on the correlation plots of item difficulty 
and biserial correlation. However, there seemed to be no definite pat- 
tern distinguishing these items from the others. 

Since examinees may make careless errors or may omit items 
which they are able to solve, the assumption that an examinee who 














LYNNETTE B. PLUMLEE 35 


knows the correct answer to an item answers the item correctly in 
both multiple-choice and answer-only form would not be expected to 
hold in the current experimental situation. There seems to be no ob- 
jective way of determining exactly to what extent this assumption 
was met. 

Although the assumption that an examinee who does not know 
the correct answer to an item answers the item incorrectly in answer- 
only form probably held, no attempt was made to meet the assump- 
tion that such an examinee will answer the item according to chance 
in multiple-choice form, since it was felt desirable to match the cur- 
rent test construction practice of including among the options for a 
multiple-choice item those incorrect answers which will be obtained by 
a large number of examinees on the basis of a wrong solution. Logi- 
cally, the result of such a procedure is to eliminate the operation of 
chance for those examinees who use a popular wrong method of solu- 
tion. Hence, the failure to utilize wrong answer options which would 
be equally attractive to an examinee who does not know the correct 
answer might alone account for the tendency of observed multiple- 
choice values to differ from predicted multiple-choice values in the 
direction of answer-only values. 

One factor which may have operated to keep the proportion an- 
swering correctly high in multiple-choice form in spite of a reduced 
chance element is the check which examinees have on their answers. 


Although these explanations cannot be assumed from the data 
presented in the current study, the evidence does seem to indicate that 
item-test correlation and test reliability may not be as adversely af- 
fected by the multiple-choice form as has frequently been assumed. 


REFERENCES 

1. Carroll, J. B. The effect of difficulty and chance success on correlations be- 
tween items or between tests. Psychometrika, 1945, 10, 1-19. 

2. Denney, H. R., and Remmers, H. H. Reliability of multiple-choice measur- 
ing instruments as a function of the Spearman-Brown prophecy formula, 
II. J. educ. Psychol., 1940, 31, 699-704. 

3. Guilford, J. P. The determination of item difficulty when chance success is 
a factor. Psychometrika, 1936, 1, 259-264. 

4. Horst, Paul. The chance element in the multiple choice test item. J. gen. 
Psychol., 1932, 6, 209-211. 

5. Horst, Paul. The difficulty of a multiple-choice test item. J. educ. Psychol., 
1988, 24, 229-232. 

6. Horst, Paul. The difficulty of multiple choice test item alternatives. J. exp. 
Psychol., 19382, 15, 469-472. 

7. Johnson, A. P. An index of item validity providing a correction for chance 
success. Psychometrika, 1947, 12, 51-58. 





86 


10. 


ai 


13. 


14, 


15. 


16. 


be @ 





PSYCHOMETRIKA 


Lord, F. M. Reliability of multiple-choice tests as a function of number of 
choices per item. J. educ. Psychol., 1944, 35, 175-180. 

Remmers, H. H., and Adkins, R. M. Reliability of multiple-choice measuring 
instruments as a function of the Spearman-Brown prophecy formula, VI. 
J. educ. Psychol., 1942, 33, 385-390. 

Remmers, H. H., and Ewart, Edwin. Reliability of multiple-choice measur- 
ing instruments as a function of the Spearman-Brown prophecy formula, 
III. J. educ. Psychol., 1941, 32, 61-66. 

Remmers, H. H., and House, J. M. Reliability of multiple-choice measuring 
instruments as a function of the Spearman-Brown prophecy formula, IV. 
J. educ. Psychol., 1941, 32, 372-376. 

Remmers, H. H., Karslake, Ruth, and Gage, N. L. Reliability of multiple- 
choice measuring instruments as a function of the Spearman-Brown proph- 
ecy formula, I. J. educ. Psychol., 1940, 31, 583-590. 

Remmers, H. H., and Sageser, H. W. Reliability of multiple-choice measur- 
ing instruments as a function of the Spearman-Brown formula, V. J. edue. 
Psychol., 1941, 32, 445-451. 

Rider, P. R. An introduction to modern statistical methods. New York: John 
Wiley & Sons, Inc., 1939. 

Votaw, D. F. Notes on validation of test items by comparison of widely 
spaced groups. J. educ. Psychol., 1934, 25, 185-191. 

Wilks, S. S. Elementary statistical analysis. Princeton, New Jersey: Prince- 
ton Univ. Press, 1948. 

Wilks, S. S. Unpublished notes on the derivations of the confidence limits of 
the regression intercept. 


Manuscript received 2/15/51 
Revised manuscript received 6/1/51 











PSYCHOMETRIKA—VOL. 17, NO. 1 
MARCH, 1952 


A FACTOR ANALYSIS OF WOMEN’S MEASUREMENTS TAKEN 
FOR GARMENT AND PATTERN CONSTRUCTION* 


HELEN HEATH 
CHICAGO, ILLINOIS 


In order to facilitate garment and pattern construction, a re- 
search which involved taking a minimum of fifty-five measurements 
on several thousand women was conducted by the Bureau of Home 
Economics, Partial correlations with age held constant were com- 
puted for a representative group of 4,128 of the women. The corre- 
lations among twenty-nine of these variables served as the basis of 
the present study. By a combination of the multiple-group and the 
centroid method of factoring, five factors were extracted. After 
twenty-nine rotations, simple structure was evident, and the factors 
were interpreted as bone length, size of joints, circumference below 
the waist, circumference of extremities, and circumference above 
the waist. The intercorrelations of the primaries were computed, 
and two second-order factors were extracted. One of these seems to 
be primarily related to the growth of fatty tissue and the other to 
the development of\the bones. 


Prior to the era of psychology there was at least a half-hearted 
conviction that personality was related through some obscure chan- 
nels to body build. Motivated by the hope of verifying or refuting 
this view, hypotheses have been formulated and carefully inves- 
tigated by members of the Italian School of Clinical Anthropology 
(6, 13, 16, 25), by Kretschmer (9) in Germany, Sheldon (20, 21, 
22) in America, and numerous other students of human behavior. An 
extended historical review of these studies is presented by Cabot (3). 

Within fairly recent times the technique of factor analysis has 
been applied to this problem. The original plan for the factor analy- 
sis studies was to determine whether or not a single factor is suffici- 
ent to account for human growth. Later investigations which utilized 
the multiple-factor methods were designed with the immediate aim 
of revealing the parameters of physical growth and with the ulterior 
hope that the causative factors responsible for growth may later 
be linked with particular characteristics of temperment. Spearman 
(23), Burt (1, 2), Cohen (4, 5), Mullen( 12), McCloy (10), Holzin- 
ger and Harman (8), Hammond (7), Rees (17), Rees and Eysenck 
(18), Thurstone (24, 25) and Moore and Hsii (11) are among the 

*The author wishes to express her appreciation to Professor L. L. Thurstone 


and to the members of the Psychometric Laboratory Staff for advice relative 
to this study. 


87 











88 PSYCHOMETRIKA 


authors who have reported studies on factor analysis of anthropom- 
etric measurements. Subjects on whom the measurements were taken 
include normal children, neurotic or delinquent children, adolescents, 
college students, average male adults, neurotics, psychotics, and 
criminals. The methods of factoring vary with the preferences of 
the authors and range from Spearman’s general factor method, 
through the bi-factor methods, multiple-factor methods without rota- 
tions, with orthogonal rotations, and with oblique rotations. Further- 
more, there is little consistency regarding the nature and number of 
variables selected. Some studies include face and head measures, some 
hand measures; and others are predominantly measures of the trunk 
and limbs. 

With such a number of variations, precise uniformity of results 
could scarcely be anticipated. However, certain factors appear in 
several of the studies. These will be considered in relation to the fac- 
torial composition of the present research which is based on a corre- 
lation matrix of twenty-nine measurements taken on 4,128 normal 
adult women. 


Source of the Data 

The data for this investigation are contained in a government 
bulletin (14) which reports a study conducted between July 14, 1939, 
and June 1, 1940, by the Bureau of Home Economics subsidized by a 
Federal Project Grant of the Work Projects Administration. The 
aim of the research was to standardize sizes of women’s clothing in 
order to facilitate garment and pattern construction. A minimum of 
fifty-five measures were taken on over 14,000 women with normal 
physique who were residing or visiting in the District of Columbia, 
Arkansas, California, Illinois, Maryland, New Jersey, North Caro- 
lina, or Pennsylvania. Both native and foreign-born women were rep- 
resented; all, however, were of the Caucasian race. Although mem- 
bers of a variety of economic backgrounds participated, it was the 
final opinion of those in charge of the measuring program that the 
average woman measured belonged to a lower-than-average income 
group. The age of the subjects ranged from eighteen to eighty. 

Ten schools were established which provided short courses for 
training the measurers, and standards of precision were required be- 
fore one was considered qualified to participate in the measuring. The 
entire program was carefully supervised and one may have confidence 
in the accuracy of the results. A detailed statistical study was made 
on 4,128 cases. Twenty horizontal measures were correlated and fac- 




















HELEN HEATH 89 


tored by the method of principle components. This was the appro- 
priate procedure considering the purpose for which the study was 
designed, as it indicated which were the most essential girth meas- 
ures to be considered in garment construction. The primary purpose 
of the present study was to investigate the underlying domain of 
growth, and for that reason a factoring and rotational procedure 
which resulted in simple structure was used. Complete and partial 
correlations with age held constant had been computed for the group 
of 4,128 women. Since age influences size even during adulthood, the 
partial correlations were considered more suitable for the factor 
analysis. The twenty-nine variables which were selected from the to- 
tal available list are presented in Table 1. 


Procedure and Resulis* 


The first step in any factor analysis is to select a method of esti- 
mating the communalities. For the present study the following for- 
mula was used; (26, pp. 300 and 318) 


sc (Sr. +t)? 
Sr+Ddt_ 
The application of this formula requires that a matrix be constructed 
which consists of the correlation coefficients of three or more vari- 


ables which correlate highest with the measure, denoted as variable 
1, for which the communality is being estimated. 


h,? equals the estimated communality for variable 1, 
>i is the sum of the known coefficients in column 1, 
t, is the highest coefficient in column 1, 
>7 is the sum of the known coefficients in all the columns, and 


> is the sum of the highest known coefficients in each of the 
columns. 


The multiple-group method (26, pp. 170-175) was selected for 
the factoring. Three factors were originally extracted by this tech- 
nique; the residuals were computed, and a fourth factor was taken 
out by the grouping method. New communality estimates were made 
by squaring the entries in each row of the factor matrix, and the four 

*Only those tables considered most essential are contained in this article. A 
microfilmed reproduction of the complete thesis which includes a more extensive 
set of tables may be procured by ordering Dissertation No. T 888 from the Li- 


brary Dept. of Photographic Reproduction, University of Chicago, for a fee 
of $1.30. - 











90 PSYCHOMETRIKA 


factors were then extracted simultaneously by the multiple-group 
procedure. Residuals were still rather large, but no particular pat- 
tern was evident, so the centroid method was used for removing the 
fifth factor. The highest 7 in each column of the preceding residual 
table was accepted for the diagonal entry in the corresponding col- 
umn in this final operation. Although one residual remained with a 
value of —.16, all the others ranged between —.06 and .14 inclusive; 
the standard deviation for the distribution of residuals was .0239. 
Twenty-nine radial rotations were taken before the planes seemed 
satisfactorily located. The centroid matrix, transformation matrix, 
and oblique factor matrix are contained in Tables 2, 3, and 4 respec- 
tively. The correlations among the primaries were computed and are 
recorded in Table 5. Diagonal estimates were made by the procedure 
described previously, and the matrix was factored by the centroid 
method. Two factors were extracted and four cycles of factoring 
were required before the communalities were satisfactorily stabi- 
lized. Table 6 contains this second-order centroid. One rotation was 
made; the transformation and the oblique structure which resulted 
are presented in Tables 7 and 8. The correlation between the two 
second order factors is .33 


Discussion of First- and Second-Order Factors 


Factor A has higher loadings than any of the other factors; it 
is exceptionally well defined, having only one projection between .12 
and .40. The five measurements with significant values are stature, 
sitting height, arm length, tibiale height, and hip height; the factor 
is obviously related to the length of bones. Sitting height, which is 
actually the length of the spine, has a much lower loading than the 
other four. This variable has the highest projection of any on Fac- 
tor B, thus suggesting that the length of the back bone is partially 
determined by influences not affecting the long bones. The nature of 
this parameter will be discussed more fully in consideration of Fac- 
tor B. The one low value of Factor A which might possibly be con- 
sidered in the interpretation is .15 for variable 25, anterior chest 
width. This result may not be entirely due to error variance, as the 
anterior chest width is partly determined by the length of the ribs, 
and it is not unreasonable to assume that length of the ribs depends 
to some extent upon the same growth processes which are responsible 
for the other long bones. Excluding Spearman (23) and McCloy (10), 
all the other authors reported a factor which was primarily repre- 
sented by vertical measures. In most cases measures other than those 




















HELEN HEATH 91 


depending upon bone length were omitted from the factor; Hammond 
(7), however found bone length related once positively and once neg- 
atively to head measures. Holzinger (8), Rees (17) Rees and Eysenck 
(18), and Cohen (4, 5) found vertical measures in antithesis to girth 
measures. Since abnormalities of growth are in many cases de- 
pendent upon malfunctioning of the pituitary gland, it is possible 
that minor variations among normals may have the same causation. 
It was inevitable that the bone length factor was omitted from Spear- 
man’s (23) results, as he was interested only in determining whether 
or not there was a general growth factor analogous to the ‘“‘g” studied 
in connection with intelligence. McCloy (10) states that he was 
tempted to call his general growth factor a linear growth factor, for 
the linear measures had the highest loadings. Several factor analyses 
involving individuals at different age levels are reported by McCloy, 
and he noted that as the age increased, the tendency for the linear 
measurements to predominate became more apparent. However, all 
the variables excluding measures of fatty tissue had significant load- 
ings on the factor, so general growth was accepted as the more appro- 
priate designation. 

Factor B, which was the most difficult of the five factors to in- 
terpret, has only one value which is above .40, sitting height. The 
next highest, ankle girth, is .384. Stature, wrist girth, and minimum 
calf girth are in the twenties. Elbow and forearm girth with values 
of .19 each and armscye at .18 have been included in the interpreta- 
tion. The low values of the elbow and forearm girth may be due in 
part to the fact that they also have appreciably high loadings on both 
D and E. All circumference measures were taken with a steel tape, 
and although great caution was exercised to insure accurate readings, 
there was probably a larger relative error on the small girth meas- 
ures than on the large girth and long bone measures. If so, this in- 
creased relative error may further explain the rather low projections. 
Factor B has been interpreted as cancellous bone size. The composi- 
tion of the vertebrae and the ends of the long bones which include 
the protusions at the joints is unlike that of the shafts of the long 
bones (19). The former, which is known as the cancellous portion of 
the bone, has an open and spongy structure in contrast to the dense 
shafts of the narrow-filled bones. The three significant projections on 
B which are not joints are stature, forearm girth, and minimum leg 
girth. Since stature is partially determined by the length of the spine, 
its inclusion in Factor B is not unexpected. Both the minimum calf 
girth and the forearm girth are measures taken rather close to a joint, 








92 PSYCHOMETRIKA 


and consequently the enlargement of the bone as conditioned by the 
joint could influence these variables. Although it has a low loading, the 
armscye girth is included; this is consistent with the interpretation, 
as the circumference of the armscye is controlled in part by the 
humerus at the shoulder joint. Two variables which one might expect 
to have high loadings on Factor B but which did not, are knee girth 
at tibiale and bent knee girth. These omissions may be explained by 
the fact that women tend to have fat deposits around the knee which 
in many cases may overshadow the influence exerted by the bone cir- 
cumference. A factor similar to this one has not been explicitly men- 
tioned by any of the other authors, and there is little evidence that 
it is apparent in any of their results. 


Factor C will be omitted for the present and discussed later in 
connection with Factor E, so that the similarities between the two 
may be clearly pointed out. Factor D contains the girth measures of 
the extremities. With two exceptions, all the accepted variables have 
loadings above .30. These two are elbow girth and forearm girth, 
each with a value of .19. Since both of these variables are also in- 
cluded in Factors B and E, the low projections on D are not too dis- 
appointing. The next highest value on D is only .10 for midway thigh 
girth; consequently the dividing line between the significant and the 
insignificant loadings is very distinct. Factor D begins with the knee 
and elbow and includes all girth measures below. This factor has not 
appeared in any of the other studies, although in many cases the re- 
quired measures were taken. Thurstone’s (24) analysis of Hammond’s 
data gave rise to an extremity size factor; however, the extremities 
were the head and the hands rather than the forearm and lower part 
of the leg. 


Factors C and E are complimentary in that Factor C includes 
the girth measurements of the lower half of the body, and Factor E, 
those of the upper half. Minimum calf girth and ankle girth are 
omitted from the C factor. This may be due to the fact that these are 
primarily determined by bone circumference while the other below- 
the-waist girth measures depend chiefly on fat deposits. Variable 21, 
which is upper arm girth, has a value of .22 on Factor C. This is nine 
points lower than any of the other reasonably high loadings, but is 
high enough to indicate that there is some connection between the fat 
on the upper arm and that on the lower portion of the body. On Fac- 
tor E there are no values between .08 and .26 so the demarcation is 
very clear. Unlike C which excluded ankle girth and minimum calf 
girth, Factor E includes wrist circumference. In fact every girth 











HELEN HEATH 93 


measure including and above the abdominal-extension girth has a sig- 
nificant loading on E. The projections of the abdominal-extension 
girth on these two factors are especially interesting. It has a loading 
of .35 on Factor C and of .31 on Factor E, thus indicating that it is 
the dividing line between the upper and lower portion of the trunk. 
Waist circumference has a slight loading of .17 on C; however, most 
of its variance is accounted for by E. Weight is about equally repre- 
sented by the two factors with a loading of .33 on C and .28 on E. 

With the exception of Spearman’s three analyses (23) and the 
second study of Hammond (7) all of the interpretations include some 
kind of girth factor. Some of these characterize the type of individ- 
ual such as “‘stocky ;” others are opposite vertical measures in bi-polar 
factors; others are described by adjectives such as fat, cross-section- 
al, or transverse. However, none of these gives any indication of a 
division such as that found between C and E; no sharp distinction 
has been noted between the upper and lower girth measures. A veri- 
fication of this finding is, nevertheless, apparent in Sheldon’s obser- 
vation (22, p. 809) that growth above and below the waist is often 
not uniform, thus causing dysplastic individuals. As may be expected, 
the correlation between Factors C and E is very high. 


In order to avoid confusion between the first- and second-order 
factors, the second-order variables will be identified as A, B, C, D, 
and E, and the new planes will be denoted as X and Y. Plane X has 
three high loadings, variables C, D, and E which range in values 
from .57 to .85. The X factor is obviously interpretedas the fat ac- 
cumulations on the trunk, arms, and limbs. Had face measures been 
included in this study, it is possible that plumpness of the face would 
also have been included in this second-order factor. Variable C, which 
characterizes the girth measures below the waist, the portion of the 
body generally containing the largest portion of fatty tissue, has the 
highest value. This second-order factor is probably the result of the 
interaction between biological and social determinants. C, D, and E 
have low positive or low negative loadings on Factor Y. 


The remaining two first-order variables, A and B, have projec- 
tions of .46 and .38 respectively on the Y axis. Y is interpreted as 
the bone-size factor since it is determined by the length of the bone 
shafts and by the largeness of the cancellous portion of the bones. 
According to endocrinologists (15), giantism and dwarfism are due 
to malfunctioning of the anterior pituitary. If the gland is overactive 
during early life, bones grow abnormally long; the length is not af- 
fected, however, if the overactivity begins later. Instead the joints 








94 





PSYCHOMETRIKA 


become excessively enlarged; and if hyperactivity is severe, acromeg- 
aly results. The variation between the length of the bones and the size 
of the joints may be interpreted as a result of differences in time of 
maturation as characterized by closing of the epiphysus in the long 
bones and decrease in activity of the anterior pituitary. If these al- 
ways occurred simultaneously, it may be that the variables in the 
tirst-order A and B factors would constitute only a single factor — 
namely, bone size. Both A and B have practically zero loadings on 
the X factor. The positive correlation between the second-order fac- 
tors is accepted as an indication of general growth. 


10. 


11. 


12. 


138. 


14. 


15. 


a7. 


REFERENCES 
Burt, C. The analysis of temperament. Brit. J. med. Psychol., 1938, 17, 158- 
188. 
Burt, C. Factor analysis of physical growth. Natwre, 1943, 152, 75. 
Cabot, P. S. de Q. The relationship between characteristics of personality 
and physique in adolescents. Genet. Psychol. Monogr., 1938, 20, 3-120. 
Cohen, J. Determinants of physique. J. ment. Sci., 1988, 84, 495-512. 
Cohen, J. Physique, size and proportions. Brit. J. med. Psychol., 1939-41, 
18, 323-337. 
Di Giovanni, A. Clinical commentaries deduced from the morphology of 
the human body. Translated from the second Italian edition by J. J. Eyre. 
London and New York, 1919. 
Hammond, W. H. An application of Burt’s multiple general factor analysis 
of the delineation of physical types. Man, 1942, 42, 4-11. 
Holzinger, K. J., and Harman, H. H. Factor analysis. Chicago: Univ. Chi- 
cago Press, 1941. 
Kretschmer, E. Kérperbau und Charakter. Berlin: Springer, 1921. (Trans- 
lated into English as Physique and Character by W. J. H. Sprott. London: 
Kegan Paul, Trench, Trubner, 1925.) 
McCloy, C. H. An analysis for multiple factors of physical growth of differ- 
ent age levels. Child Develpm., 1940, 11, 249-277. 
Moore, T. V., and Hsii. E. H. Factorial analysis of biological measurements 
in psychotic patients. Hum. Biol., 1946, 18, 183-157. 
Mullen, F. Factors in the growth of girls seven to seventeen years of age. 
Unpublished Ph.D. dissertation, Department of Education, University of 
Chicago, 1939. 
Naccarati, S. The morphological aspects of intelligence. Arch. Psychol., 1921, 
6, No. 45. 
O’Brien, Ruth, and Shelton, W. C. Women’s measurements for garment and 
pattern construction. Miscellaneous Publication No. 454. Washington: U. S. 
Government Printing Office, 1941. 
Patten, B. M. Human Embryology. Philadelphia: The Blakiston Co., 1946. 
Pende, N. Constitutional inadequacies. Translated into English by Ss. Nac- 
carati. Philadelphia: Lea and Febiger, 1928. 
Rees, L. A factorial study of physical constitution in women. J. ment. Sci., 
1950, 46, 619-632. 











18. 


19. 


20. 


21, 


25. 


26. 


27. 


HELEN HEATH 95 


Rees, W. L., and Eysenck, H. J. A factorial study of some morphological and 
psychological aspects of human constitution. J. Ment. Sci., 1945, 41, 8-21. 
Rowntree, C. W. Bones, disease and injuries of. Encycl. Brit. (14th Ed.), 
Vol. 3, p. 845. 

Sheldon, W. H., Stevens, S. S., and Tucker, W. B. The varieties of human 
physique: an introduction to constitutional psychology. New York and Lon- 
don: Harper and Bros., 1940. 

Sheldon, W. H., and Stevens, S. S. The varieties of temperament: a psychol- 
ogy of constitutional differences. New York and London: Harper and Bros., 
1942. 

Sheldon, W. H. Varieties of delinquent youth: an introduction to constitu- 
tional psychology. New York and London: Harper and Bros., 1949. 
Spearman, C. The abilities of man. New York: MacMillan, 1927. 
Thurstone, L. L. Factor analysis and body types. The Psychometric Labora- 
tory, Univ. of Chicago, No. 24, 1945. 

Thurstone, L. L. Analysis of body measurements. The Psychometric Labora- 
tory, Univ. of Chicago, No. 29, 1946. 

Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. of Chicago Press, 
1947. 

Viola, G. La Costitizione Individual. Bolonga: L. Cappelli, 1983. 


Manuscript received 1/10/51. 
Revised manuscript received 3/1/51. 





PSYCHOMETRIKA 


96 

















TABLE 1 
Reduced Correlation Matrix* 
Variable 2 es "Ss eS Se SS Se oe oe Oe ee 

1. Weight . 
2. Stature 30 
3. Hip height 18 82 
4. Tibiale height. 19 74 72 
5. Total posterior arm 

length 31 76 74 65 
6. Sitting height 338 67 46 48 44 
7. Bust girth . 88 08 01-05 15 15 
8. Waist girth. 87 03 038 01 10 10 90 
9. Abdominal-extension- 

girth 90 10 08 06:15 17 85 90 
10. Hip-girth 91 18 08 10 20 26 77 78 86 
11. Sitting spread girth 89 16 08 O07 18 24 76 78 86 93 
12. Maximum thigh girth 87 14 05 07 16 23 738 72 80 91 90 
18. Midway thigh girth 838 10 038 O7 11 20 69 69 75 84 85 91 
14. Bent knee girth. 76 27 16 22 26 26 59 60 66 78: 78 .92: 78 
15. Knee girth at tibiale 75 23 17 28 22 26 58 60 64 78 72 78 75 84 
16. Maximum calf girth 77 18 OF 11 18 28 G61 60° 64 78°73 °75 TT ‘76 
17. Minimum leg girth 60 18 09 11 16 21 45°47: 48 #57 56°56 57 66 
18. Ankle girth 58 30 21 23 30 36 40 42 42 48 45 44 45 57 
19. Neck base girth 67 23 14 14 22 21 68 62 58 54 58 51 49 46 
20. Armscye girth 80 20 11 12 26 24 77 78 78 70 69 69 64 58 
21. Upper arm girth 86 03 -01 -01 08 15 84 82 82 80 78 80 76 68 
22. Elbow girth 72 21 11 14 25 24 65 63 68 65 62 64 61 59 
23. Forearm girth 81 19 08 09 22 26 738 70 %1 74 T1 74 72 65 
24. Wrist girth 61 831 19 22 34 29 51 52 51 52 49 48 48 57 
25. Anterior chest width 59 26 17 #20 28 238 55 51 50 49 47 46 45 42 
26. Highest bust level 

width 51 14 10 06 18 18 52 51 47 42 42 40 88 85 
27. Posterior chest 

width 62 14 09 09 18 17 64 62 57 58 51 49 46 389 
28. Posterior hip arc 76 18 08 O7 16 19 68 64 71 86 80 81 73 68 
29. Angle of shoulder 

slope 05 05 02 02-01 06 -10 -09 -07 -05 -05 -03 -03 -—02 





VariableNo.: 1 2 3 4 5 6 7 8 


9 


10 11 12 13 14 





*The initial decimal point has been omitted for al] entries. 











HELEN HEATH 97 


TABLE 1. (Continued) 
Reduced Correlation Matrix* 














Variable 15 16 17 18 19 20 21 22 28 24 25 26 27 28 





. Weight 

. Stature 

. Hip height 

. Tibiale height 

. Total posterior arm 


length 


. Sitting height 

. Bust girth 

. Waist girth 

. Abdominal-extension 


girth 


. Hip girth 

. Sitting spread girth 

. Maximum thigh girth 

. Midway thigh girth 

. Bent knee girth 

. Knee girth at tibiale 

. Maximum calf girth 75 

. Minimum leg girth 66 72 

. Ankle girth 54 55 69 

. Neck base girth 46 47 37 35 

. Armscye girth 57 57 44 48 59 

. Upper arm girth 638 64 47 38 60 79 

. Elbow girth 58 59 50 47 58 67 170 

. Forearm girth 65 68 56 48 59 78 80 88 

. Wrist girth 54 58 60 59 46 55 52 58 62 
. Anterior chest width 41 41 32 31 50 51 52 46 50 42 
. Highest bust level 


width 33 34 26 30 44 45 46 38 41 35 45 


. Posterior chest 


width 38 48 32 31 51 50 55 47 52 40 23 32 


. Posterior hip arc 63 64 50 41 45 59 67 54 62 44 40 34 44 
. Angle of shoulder 


slope —02 -03 -02 -01 00 -08 -04-01 00 -03 -03 -03 -04 -—03 





Variable No.: 15 16 17 18 19 20 21 22 28 24 25 26 27 28 





*The initial decimal point has been omitted for all entries. 





98 


PSYCHOMETRIKA 


























TABLE 2 
Centroid Factor Matrix F 
I II III IV V h2 
) 43 .59 -66 06 —.04 97 
2 91 —04 —12 —15 —.12 88 
3 86 —.32 038 —.13 .05 .86 
4 78 —21 —03 —.03 —.02 65 
5 81 —.08 03 —.16 —.01 69 
6 .60 24 —14 —07 —.26 51 
7 Rhy f -62 66 —.14 12 .88 
8 Re lef 1 74 —.08 15 87 
9 22 .54 By é 00 —.07 .85 
10 ol .55 64 24 —.23 92 
11 -28 51 .67 25 —.19 89 
12 + | -00 -66 29 —.20 88 
3 24 48 .64 25 —.11 34 
14 -42 AT 42 44 -08 ah f 
15 41 40 48 48 14 81 
16 04 -48 46 46 -09 -78 
17 04 62 .09 48 16 .76 
18 47 56 —.01 By f 15 -63 
19 .30 43 46 —.16 AT 54 
20 ol 58 00 —.1i1 .05 -70 
21 15 57 71 —.02 09 86 
22 33 .58 40 —.01 14 63 
23 ol .65 47 .03 12 75 
24 45 52 .20 .09 21 o7 
25 Ast) .06 35 —11 .08 38 
26 20 38 oo —.14 10 32 
27 22 Al 41 —.12 -06 -40 
28 .24 AT .56 22 —.25 -70 
29 -02 01 —.09 02 —.03 01 
TABLE 3 
Transformation Matrix 

A B C D E 

I 85 14 .03 17 —.01 

II —.49 70 —.18 .30 37 

III 07 —.56 50 —.33 21 

IV —.19 —.26 46 57 —.76 

V — 01 —32 —.72 67 -50 











HELEN HEATH 























TABLE 4 
Oblique Factor Matrix V 
A B C D E 
1 All 10 33 .04 .28 
4 81 25 —.01 .02 01 
3 92 —10 —.01 .00 .00 
4 77 —.01 .04 04 —.07 
5 -76 .09 —.02 01 .08 
6 40 48 .07 .00 —.02 
7 —.08 .09 11 —.01 .53 
8 —04 —.06 17 .00 AT 
9 —.03 .03 325 —.08 31 
10 —.01 .08 54 —.02 04 
11 —.01 .01 53 —.01 -04 
12 —.03 .00 55 01 .00 
13 —.06 —.05 51 10 —.01 
14 07 01 ol 38 —.04 
15 09 —.11 33 40° —.05 
16 .00 —.02 38 28 —.03 
17 —.10 25 08 59 —.04 
18 07 34 —.04 -50 .07 
19 10 07 —.01 .05 45 
20 .03 18 10 08 438 
21 —10 —.01 .22 01 41 
22 02 19 .03 19 37 
23 —.03 19 09 19 38 
24 12 22 —.06 36 -26 
25 15 10 03 04 38 
26 -03 12 —.038 .03 36 
27 04 10 .06 .00 36 
28 —.02 07 51 —.05 -00 
29 .00 07 —.02 02 —05 
TABLE 5 
Correlations Between Primaries 
T, T, To Tp Ty 
Ty 1.00 22 12 18 .09 
Ts 22 1.00 22 11 .05 
To 12 .22 1.00 -60 73 
Tp 18 11 .60 1.00 48 
T, .09 .05 73 48 1.00 








99 








100 


PSYCHOMETRIKA 



































TABLE 6 
Centroid Factor Matrix F 
x Y 
A .294 —.383 
B -268 —.315 
Cc 898 237 
D 647 110 
E .689 350 
TABLE 7 
Transformation Matrix 
x Y 
x -781 330 
€ 625 —.944 
TABLE 8 
Oblique Factors Matrix V 
x = 
A —.009 .459 
B .012 386 
C .849 072 
D 574 -110 
E 757 —.103 




















PSYCHOMETRIKA—VOL. 17, NO. 1 
MARCH, 1952 


A TECHNIQUE FOR CRITERION-KEYING AND 
SELECTING TEST ITEMS* 


JOHN W. FRENCH 


EDUCATIONAL TESTING SERVICE 


For multiple-choice tests where no a priori key exists, the 
initial selection of a key for maximum validity may be made on the 
basis of the number of persons choosing each alternative and their 
mean criterion score. The keying formula is derived. Once the initial 
keying has been done, further precision in keying and item selec- 
tion may use, in addition, the mean total test score for persons 
choosing each alternative. Item-selection formulas suggested by 
Horst and by Gulliksen for maximizing test validity are both in the 
form of a ratio, an “item-validity index” divided by an “item-reli- 
ability index.” The formula derived here is shown to be equivalent 
to the numerators of these formulas. The expression in the denomi- 
nators uses the total test score. Although a radical appears in the 
denominator of Horst’s formula and not in the denominator of Gul- 
liksen’s formula, both of them select the same items in practice. . 


Multiple-choice tests such as those of Practical Judgment, Data 
Interpretation, Word Association, and other objective personality 
tests can be devised in such a way that there is no predetermined cor- 
rect response to the items. The alternative for each item to be scored 
as correct may be selected on the basis of expert judgment. One way 
to obtain appropriate, if not expert, judgment is to administer the 
test to a group of individuals having known scores on an appropriate 
criterion. This method calls for keying as “correct” those alternatives 
that are selected most frequently by those members of the group hav- 
ing high criterion scores. 

The method described here for selection of keyed responses to 
maximize the correlation between total test score and criterion is 
based on the available item statistics tabulated in the course of the 
routine IBM machine item analysis used by the Educational Testing 
Service. For each alternative of each item are tabulated the number 
of testees choosing it and the total criterion score summed over tes- 
tees choosing it. Thus for each item there are as many N’s and 
mean criterion scores as their are alternatives. Also available is the 
total criterion score summed over all testees. 

The following notation will be used in deriving a formula for se- 
lection of item alternatives: 


*The author gratefully acknowledges the suggestions and criticisms of Dr. 
Harold Gulliksen, Research Adviser at the Educational Testing Service. 


101 








102 PSYCHOMETRIKA 


; =the number of items answered correctly by individ- 
ual i according to a particular key, 
=the deviation from the mean of Xi, 
=the deviation score of individual i on the criterion, 
=the number of testees, 
=the number of items in the test, 
= the mean test score over all testees, 
‘=the mean criterion score over all testees, 
=the number of testees choosing a particular alterna- 
tive of item j, 
=the mean criterion score for testees choosing a par- 
ticular alternative of item 7, and 
S;; =the raw score of individual i on item j (i.e., 1 when a 
particular alternative is chosen, 0 when that alterna- 
tive is not chosen). 


> 


=? 


[d| 2s 


2 


I 


The problem will be solved here by maximizing the validity co- 
efficient , 
__ ary 


Voy = ’ 
Na. roy 


by maximizing the numerator Sy. In the process o, will be unaf- 
fected as, of course, will N . It must, however, be assumed that any 
change in o, will be small in comparison to the change in Sry. 


Sey => (Xi—X) yi (1) 


N = 
= (Xiyi — Xyi). (2) 


The expression in parenthesis may be expressed as a sum over ail 
items in the test. 


N K N; 
Bay = 3 3 (Si — st v:) (3) 
i=1 j=1 N 
K N N; 
=z: (sin. Seu) . (4) 
js i= 
Since S;; = 0 when a particular alternative is not chosen, 


N 
> Siyi calls for summation of y; only over the N; individuals mark- 


i=1 











JOHN W. FRENCH 103 
ing the particular alternative. Thus, 


N ne 
> Sisyi = NjY;. (5) 
=1 
The second term within the parenthesis, on the other hand, is 
summed over the total group, since it does not contain Sj; . 





ah N; oe Dyi N y 
eee (6) 
Substituting (5) and (6) in (4), we may write: 
K a pe 
Sry => (Nj Y; —N;Y) (7) 
j=1 
| oe ie er i 
=2[ a %,—¥)] (8) 
j=1 


In order, then, to maximize >wy and hence 7,,, it is necessary 
to maximize the expression in brackets for each item of the test. This 
can be done readily with the values N;, Y; and Y available from rou- 
tine item-analysis. For each item the alternative having the largest 
value for N;(Y; — Y) is keyed as correct. 

Adkins and Toops (2) suggested the use of point-biserial correla- 
tions between alternatives and criterion scores for keying. Formula 
(8) does very much the same thing as the point-biserial correlation, 
but tends to avoid the selection of alternatives attracting so few sub- 
jects as to be less useful to the total test validity than a more popular 
alternative with a slightly lower biserial correlation. 

Formula (8) makes use only of the item-criterion correlations. 
Before the initial keying, there is nothing else available. As soon as 
keying has been done, the tests may be scored and a further refine- 
ment of the key and selection of items can take item-test correlations 
into account. A maximizing function for item selection, using item- 
criterion and item-test correlations has been developed by Horst (6) 
and by Gulliksen (4, 5). Both of these assume a key to the total test, 
so that the item-test correlations are known. Horst’s development 
ends up with an expression in the numerator having the same value 
as the bracket of formula (8). This is the item-criterion factor or 
validity index. The denominator of his maximizing function is the 
item-test factor or reliability index. When present notation is used, 
Horst’s expression (6, formula 8) becomes 











104 PSYCHOMETRIKA 


K = ee 
> N(Y;— ¥Y) : 

Py = ms d N ° (9) 
N\(x,—x) ¥"™ 


j= 








When the present notation is used, the denominator or reliability 
index in Gulliksen’s expression for the test validity reads 
K 
»> Tj2 jy, (10) 
j=1 
where 17;z is the point-biserial item-test correlation and oa; is the item 
standard deviation. 

The following algebraic manipulation puts Gulliksen’s expres- 
sion into a form such that it can be compared with Horst’s expres- 
sion. 

Applying to (10) the formula for point-biserial correlation (1), 

M. ~~ M . — 
Tp-bis — ————- V9, (11) 
oc 

where M, is the mean score on the total test for persons answering 
the item correctly, M, the mean for those answering it incorrectly, 
o; is the standard deviation of scores on the total test, p is the pro- 
portion of persons answering the item correctly, and qg is the propor- 
tion answering it incorrectly, and the formula for the item standard 
deviation, 





_ VNi(N—N)j) 
oj = N i (12) 
(10) becomes 
3 Xj — Xw-j) [ ew —wi) | : (13) 
j=l N? Ct 


Now, the mean score on the test for the total group is 


N; X; + (N—N)j)Xw-jy 


N (14) 


x= 





Therefore, 


Ben ON) 
] 


(15) 














JOHN W. FRENCH 105 


Substituting (15) into (13) we obtain 





— NX—N;X; 
2 Won) 
3 Py]. (16) 


This may be reduced algebraically to 


K 


>» 


jar Noy 


The numerator of Gulliksen’s formula may be treated in a paral- 
lel way so that his expression for the test validity will now read 


-N,(X;—X). (17) 








TN A¥;— Y) 
ta= 2 f=]. (18) 
= N;(X;—X) = 


j=1 


The expression in brackets is a constant, and so may be dropped from 
the maximizing expression, although it is required in computing the 





validity, 7,,. Similarly the + in (9) may be dropped from 
Oy 
Horst’s maximizing expression. 

It may now be seen that the maximizing expression from Gullik- 
sen’s development is the same as that from Horst’s development ex- 
cept that a radical appears in the denominator of the latter. 

It has been pointed out (3) that Horst’s expression is based on 
the assumption that the correlation of the group of selected items 
with the group of unselected items is zero. Gulliksen’s expression is 
based on the assumption that the correlation of the group of selected 
items with the group of unselected items will be unity. Let it suffice 
to indicate that the true circumstances call for a condition lying be- 
tween these assumptions, a correlation between .00 and 1.00, so that 
the two formulas for the denominator may be considered to represent 
upper and lower bounds for the true denominator. 

In applying these formulas, the first step in keying and selecting 
items must be to use (8) in order to arrive at an initial key. Neither 
version of the reliability index can be used at first, since there is no 
test score until after the initial keying is done. 

Refinement in the key, including the keying of more than one 
response to some items and the keying of no response to some items 








106 PSYCHOMETRIKA 


(i.e., item selection), may, then, be made by the application of the 
maximizing expressions of formula (9) or (18). 

The graphical method of item selection suggested in the articles 
by both Horst and Gulliksen can be employed. In spite of the differ- 
ence between formulas (9) and (18), the graphical method suggested 
by Horst (6) as the best practical approximation selects the same 
items as that presented by Gulliksen (4, 5). 

In practice the selection process can end here. Actually several 
further re-applications of the method, as suggested by Horst and 
by Gulliksen, may result in successive further changes in the group 
of items selected, since each successive selection depends upon the 
items already selected. Further applications of the technique will 
result in successive approximations to a final solution which is stable 
or which will oscillate slightly. 


REFERENCES 

1. Adkins, D. C. Construction and analysis of achievement tests. Washington, 
D. C.: U. S. Government Printing Office. 1947. 

2. Adkins, D. C., and Toops, H. A. Simplified formulas for item selection and 
construction. Psychometrika, 1937, 2, 165-171. 

38. Green, B. F., Jr. A note on item selection for maximum validity. To be pub- 
lished. 

4, Gulliksen, Harold. Item selection to maximize test validity. Proceedings of 
the 1948 Invitational Conference on Testing Problems—“Validity Norms and 
the Verbal Factor,” Princeton, N. J.: The Educational Testing Service, 1949. 

5. Gulliksen, Harold. The Theory of Mental Tests. New York: John Wiley & 
Sons, Inc., 1950. 

6. Horst, A. P. Item selection by means of a maximizing function. Psycho- 
metrika. 1986, 1, 229-244. 


Manuscript received 2/5/51 
Revised manuscript received 6/13/51 














PSYCHOMETRIKA—VOL. 17, NO. 1 
MARCH, 1952 


A FACTORIAL STUDY OF TEMPERAMENT* 


MELANY E. BAEHR 
UNIVERSITY OF CHICAGO 


The theory is advanced that personality factors obtained in the 
first order may often represent combinations of temperament traits 
that occur in the experimental population. Under these circum- 
stances an investigation of the second order represents a purification 
process and yields factors which represent the more basic or perva- 
sive characteristics of the original behavior items included in the 
factorial study. These second-order factors can be obtained directly 
in the first order by a careful selection of the variables which enter 
into the analysis. A second-order analysis was undertaken of the. 
nine factors inherent in three of J. P. Guilford’s inventories, and 
four clearly interpretable second-order factors were obtained. Three 
of these factors were obtained directly in the first order in a new 
factorial study of twenty-two behavior items. Attention is drawn to 
the similarities between these factors and traits of temperament pos- 
tulated by an independent investigator. 


Introduction 

Multiple-factor analysis found its original application in the 
investigation of the intellective and cognitive factors of mind. In this 
domain L. L. Thurstone identified the underlying functional unities 
or primary factors of mind. These factors were found to be correlated. 
When these correlations were factored in turn, they yielded second- 
order factors. In more recent work Thurstone has drawn attention 
to the similarity between the most conspicuous second-order factor 
and Spearman’s postulated general intellective factor “‘g’’ (8, p. 403). 

Multiple-factor analysis is finding increasing application in other 
domains, including that of temperament and personality. The pres- 
ent writer has made a factorial study of temperament which empha- 
sizes the significance of second-order factors in this domain. First, 
a second-order analysis was made of the factors derived from three 
of J. P. Guilford’s inventories (3, 4, 5), and next a new factorial in- 
vestigation was undertaken to show how the interpretation of second- 
order factors could aid in determining the functional unities under- 
lying the domain. 

In this study a temperament trait is regarded as composed of 


*This paper abstracts portions of the writer’s Ph.D. dissertation. 
107 








108 PSYCHOMETRIKA 


those related behavior characterictics which are relatively permanent 
for the individual. Personality is regarded as the resultant of the in- 
teraction of these temperament traits with the environmental condi- 
tions to which they are exposed. Such personality attributes as a per- 
son’s table manners, habits of personal cleanliness, views about re- 
ligion, his political affiliations, and his social behavior in a given so- 
cial “set” are largely determined by environmental conditions and 
may change from time to time in the light of new experiences. This 
view does not imply a dichotomy of temperament and personality. 
Temperament is the raw material from which personality is fash- 
ioned; and personality is thus the medium through which tempera- 
ment traits manifest themselves. 

The assumptions made in a factorial investigation of tempera- 
ment are essentially similar to those made in a factorial investigation 
of the cognitive domain. We assume that the underlying functional 
unities of the domain can be described by a finite number of linearly 
independent factors. 

The chief difficulty in factorial investigations of temperament 
arises at the outset when the investigator is assembling his first se- 
lection of personality items to be used in the study. He encounters 
considerably more difficulty in this respect than the investigator of 
more concrete and well-defined domains, in which people have been 
schooled to respond to specific stimuli in specific ways. For instance, 
we have been taught to respond in a specific way when given a series 
of numbers to add. The investigator in the intellective domain can 
therefore be reasonably certain that such a test is measuring essen- 
tially the same function in all his subjects. In short, in the intellective 
domain there has been great stress on the standardization and ver- 
balization of response. 

In the domain of temperament the situation is quite different. 
We have not been uniformly schooled to exhibit certain temperament 
traits in response to certain stimuli. In addition, the temperament 
trait finds expression in behavior directly, and is largely unstandard- 
ized and unverbalized. Yet in the temperament domain the investiga- 
tor is usually forced to rely on the subject’s verbalized responses to 
items in personality inventories. 

In a sense, each item in a personality inventory is a “test” in a 
battery. It is clear from the foregoing paragraphs that the investiga- 
tor assembling his first selection of inventory items to cover this more 
complex and diversified domain will probably be unable to define each 
so specifically that it is a relatively pure “test” of a particular tem- 














MELANY E. BAEHR 109 


perament trait or that the response to the item will be determined by 
the degree to which a subject possesses a single temperament trait. 
The response can be expected to be the resultant of a combination of 
traits in different strengths. Thus the differentiation of responses 
achieved by a first-order analysis may well represent the different 
combinations of temperament traits or “temperament ratios” that 
occur in the experimental population. It is considered that the inter- 
pretation of second-order factors in this domain will provide some 
leverage on the problem of selecting items for subsequent studies 
which will allow us to circumvent this first differentiation of inven- 
tory items which reflect “temperament ratios.” 

The argument can be represented as follows. Let P, Q, and R rep- 
resent three behavior patterns which are relatively permanent for 
the individual. Let us assume that the responses to the items used in 
the study are in the majority of instances the resultant of two traits 
acting simultaneously. If there is a group of items for which the re- 
sponse is determined by elements of P and elements of Q (and this is 
facilitated when behavior pattern P is often associated with behav- 
ior pattern Q in the individual) then the simple structure is likely to 
reveal a first-order factor comprised of these components which we 
shall call PQ. By the same reasoning we could obtain a first-order 
factor PR. It is clear that these factors will be correlated. If the cor- 
relations between these primary factors are examined factorially, 
there should be at least one second-order factor on which both PQ and 
PR have substantial loadings. 

The interpretation of the second-order factor is determined by 
the common elements in the factors on which it has saturations. The 
interpretation of this particular second-order factor would thus pro- 
vide a description of the behavior pattern P. 

If the majority of the items in a personality inventory were so 
ill-defined that the responses were the resultant of three (or more) 
traits acting together, the interpretation of the second-order factor 
would be more complex but might still lead to the isolation of a rela- 
tively stable behavior pattern. For example, if a second-order factor 
had saturations on two primaries representing traits PQR and PST 
respectively, its interpretation would be determined by P. If the com- 
binations of traits in the primaries happened to be PQR and PQS, 
the interpretation of the second-order factor would be determined by 
PQ. 

Certainly there is no assurance that single traits will emerge in 
a second-order analysis, but it seems more likely that they will so 











110 PSYCHOMETRIKA 


emerge than that any large number of them will appear in a first- 
order analysis based on the responses to a first selection of inven- 
tory items. On the basis of the arguments advanced it might be ad- 
vantageous, where possible, to investigate the third- and fourth-order 
factors. However, the original data employed would seldom, if ever, 
be stable enough to warrant such a step. 

The interpretation of the second-order factor is determined by 
the essential similarities of the inventory items involved. Having in 
this way achieved a clear concept of the basic nature of the items, we 
are in a position to examine critically and revise our original selec- 
tions of items or “tests” in the battery. This revision should allow us 
to obtain “purer” items, i.e., items for which the response is deter- 
mined predominantly by a single trait. It will then be possible to cir- 
cumvent the first systematization and to obtain some of our original 
second-order factors directly as first-order factors. 


Second-Order Analysis of the Guilford-Martin Data 

In order to test the feasibility of the procedures outlined above, 
the first step was a factorial study to determine whether or not sec- 
ond-order factors derived from personality inventories currently in 
use could be given clear and psychologically meaningful interpreta- 
tions. For this purpose Guilford’s inventory of scores STDCR (3) 
and the Guilford-Martin inventories of scores GAMIN (4) and O, Ag, 
and Co (5) were chosen. 

Thurstone (9) has shown that the intercorrelations between the 
thirteen sets of scores obtained from these three inventories can be 
described by nine linearly independent factors.* Thurstone named 
these factors as follows: R (Reflective) ; S (Sociable) ; E (Emotion- 
ally Stable) : V (Vigorous) ; D (Dominant or ascendant in the sense 
of social leadership); A (Active); I (Impulsive); X, (tentatively 
designated as Confident) ; X. (left without interpretation). The cor- 
relations between these primary factors as given by Thurstone are 
reproduced in Table 1. 

The present writer undertook a second-order analysis of this cor- 
relation matrix. The first estimate of the communalities was the 
highest absolute value of the correlation coefficients in each succes- 


*In order to determine how many factors were represented in the 138 scores, 
Thurstone made the factorial analysis with the test reliabilities in the diagonal 
cells. Thurstone’s factors are therefore described in terms of the saturations on 
the Guilford scores. It should be noted that since the communalities were not 
noe in the diagonal cells, this procedure does not constitute a second-order 
analysis. 











111 


MELANY E. BAEHR 


sive column of the matrix.. The communalities were stabilized after 
three successive factorings by the centroid method. The final orthog- 
onal factor matrix of four factors is given in Table 2. The fourth- 
factor residuals are shown in Table 3. The orthogonal factor matrix 
was rotated to simple structure.* The transformation matrix and the 
resulting oblique factor matrix are given in Tables 4 and 5 respec- 
tively. The oblique factors are labeled A, B, C, and D. 


Interpretation of the Second-Order Factors 
Factor A has high positive saturations (.79) on Thurstone’s S 
(Sociable), (.73) on X, (Confident), and (.62) on E (Emotionally 
Stable). In addition, there is a smaller, negative loading of —.46 on 
A (Active). Thurstone’s factors, in turn, have their highest satura- 
tions on the following Guilford-Martin scores: 


Factor A 
Thurstone’s Factors Saturations on Guilford-Martin Scores 
<i) | ae a eee ROTO ES .66 Agreeableness 
.72 Cooperativeness 
ROTI ood. co sectseec ens .35 Freedom from Inferiority Feelings 
.34 Objectivity 
Emotionally Stable..........2..00........ .50 Emotional Stability 
.50 Freedom from Depression 
Active (negative) ......0...000.02...00... .44 General Activity 
.48 Cooperativeness 





The emotionally toned responses in this factor are generally ad- 
justive. The negative loading on Active suggests placidity or an ab- 
sence of high-pressure or high-strung activity. The easy-going and 
uncomplicated behavior evident here has caused us to designate this 
factor Emotionally Stable. 

Factor B has only two high saturations, one of .85 on Thurs- 
tone’s I (Impulsive) and one of .80 on Thurstone’s D (Dominant or 
social leadership). Thurstone’s factors, in turn, have their highest 
saturations on the following Guilford-Martin scores: 


Factor B 


Thurstone’s Factors Saturations on Guilford-Martin Scores 





Impulsive .60 General Activity 
.45 Rhathymia (Carefreeness, etc.) 
Dominant. .55 Ascendance 





-42 Social Extraversion 


*One alternative rotation was indicated by the structure which introduced 
minor variations in two of the factors. 








112 PSYCHOMETRIKA 


The picture is one of impulsive, carefree, and generally outgoing 
behavior responses, all of which are facilitated by spontaneous reac- 
tion to stimuli. We designated this factor Primary Function, a term 
employed by G. Heymans (6) to describe very similar behavior. His 
conceptual scheme will be dealt with more fully later. 

Factor C has high positive saturations of .52 on Thurstone’s 
V (Vigorous), .49 on X, (Confident), .47 on E (Emotionally Stable), 
and .40 on A (Active). Thurstone’s factors, in turn, have their high- 
est loadings on the following Guilford-Martin scores: 





Factor C 
Thurstone’s Factors Saturations on Guilford-Martin Scores 
Vigorous....... . 74 Masculinity 
Se Ak a hea .25 Freedom from Inferiority Feelings 
384 Objectivity 
4 Emotionally Stable....................... .50 Emotional Stability 
.00 Freedom from Depression 
1 eee ee aise .44 General Activity 


.48 Cooperativeness 


The picture is one of vigorous and confident behavior responses 
free of the restricting influences of emotional instability. This free 
and vigorous behavior characteristic is remarkably similar to what 
Heymans calls Activity and this factor is so designated. 

Factor D has a saturation of .37 on Thurstone’s R (Reflective) 
and a high negative loading of —.55 on Thurstone’s E (Emotionally 
Stable). Thurstone’s factors, in turn, have their highest saturations 
on the following Guilford-Martin scores: 


Factor D 
Thurstone’s Factors Saturations on Guilford-Martin Scores 
Reflective....................... ithe ors —.76 Thinking Extraversion 
—.41 Rhathymia (Carefreeness) 
Emotionally Stable (negative) ... .50 Emotional Stability 


.50 Freedom from Depression 


The behavior pattern is one of thinking introversion, emotional 
instability, and depression combined with negative Rhathymia. These 
emotionally toned responses are, in general, nonadjustive and the 
designation Emotionally Unstable is selected for this factor. 

It will be remembered that Factor A was designated Emotionally 
Stable. The appearance of an Emotionally Stable and an Emotionally 
Unstable factor in a single study suggests that emotionally adjustive 














MELANY E. BAEHR 113 


and nonadjustive behavior responses are qualitatively different. This 
will be discussed more fully later. 


The Heymans-Wiersma Conceptual Scheme of Temperament Traits 

This conceptual scheme of temperament traits is relatively un- 
known in the United States, but has enjoyed greater popularity in 
Great Britain and in some of the dominions. It was devised by a Hol- 
lander, G. Heymans, who published his work at the beginning of the 
century. It was elaborated and refined by a number of his followers, 
including E. Wiersma (10), whose work included an investigation 
into the relationship between the temperament traits and the develop- 
ment of character and different personality Gestalten. 

The Heymans-Wiersma scheme utilizes three variables for a ty- 
pology. These are: (1) Primary-Secondary Function ; (2) Activity; 
(3) Emotionality. It is postulated that each of these is a continuous 
variable and that each occurs in every member of the population but 
in varying degrees of strength. Heymans defines them as follows: 


In general we call someone Emotional on the basis of 
the frequency and strength of his affective reactions, in pro- 
portion to their causes; Active on the basis of frequency and 
energy of his activities, in proportion to their motives; Pri- 
mary or Secondary Functioning according to the degree to 
which cognitive and affective processes ‘perseverate’ (Ger- 
man: nachwirken), in proportion to their importance. (1, 
p. 316) 


These conceptual traits were used by S. Biesheuvel, Chief Psy- 
chologist and Director of the Aptitude Tests Section of the South 
African Air Force during World War II, as part of a test battery for 
the selection of pilots. During the course of five years’ association 
with this organization the writer was able to observe and study in 
some detail the results achieved with this scheme of assessment. 

Of the three variables, Primary-Secondary Function is probably 
most easily described in terms of specific behavior characteristics. An 
individual at the Primary Function end of this behavior continuum 
is impulsive, lively, and distractible, since he responds readily to new 
stimuli. In addition to these characteristics, Biesheuvel (2) states 
that the primary-functioning individual will show oscillations in his 
rate of work and that work which demands constant concentration 
will never appeal to him. The primary-functioning individual will 
show similar variation in mood, though the prevailing mood will be 
cheerfulness. Biesheuvel continues further: 








114 PSYCHOMETRIKA 


This cheerfulness will be unbraked and therefore far 
more unrestrained, gay and bubbling over than that of the 
S.F. . . . The P.F. are on the whole mobile and restless, 
noisy, quick and on the move. . . . The P.F.’s are impulsive 
because they react to the stimulus of the moment, the desire 
or impulse of the moment. (2, p. 7) 


These characteristics are well represented in Factor B in our second- 
order analysis, which was accordingly designated Primary Function. 

The Heymans-Wiersma concept of Activity would seem to be the 
expression of general vigor: mental, physical, or both. Biesheuvel (2, 
p. 11) writes, “Activity further facilitates enthusiasm and. optimism, 
counteracts over-cautiousness,. variability, aggressive expression of 
the emotions and emotional complexity.” 1n our second-order analysis, 
Factor C has saturations on Thurstone’s Vigorous, Active, Confident, 
and Emotionally Stable factors. These vigorous and confident behav- 
ior responses which are unhampered by emotional complexity describe 
the central concepts of the Heymans-Wiersma Activity variable, and 
Factor C was accordingly generalized as Activity. 

The Heymans-Wiersma concept of Emotionality cannot be direc- 
ly related to the second-order factors obtained in this study. Factor 
A is more descriptive of adjustive emotional responses and has been 
called Emotionally Stable, while Factor D describes maladjustive 
emotional responses and has been called Emotionally Unstable. 
Whether general Emotionality is the trait underlying both factors or 
whether the maladjustive and adjustive emotional responses are quali- 
tatively different and relatively permanent for the individual, so that 
they will always appear as different factors, is a matter for further 
investigation. If this should prove to be the case, they would be more 
useful for the description of human behavior than the over-all concept 
of Emotionality. 

In their original inventories Guilford and Martin utilized 511 
different personality items. Our analysis has allowed us to describe 
these by four second-order factors, each of which could be given a 
psychologically meaningful interpretation. 

It is of considerable interest that two of these second-order fac- 
tors (Primary Function and Activity) are very similar to tempera- 
ment traits included in a conceptual scheme described by Heymans 
and Wiersma. The two remaining second-order factors may be related 
to their Emotionality variable. It is suggested that second-order 
analyses which produce factors which represent the more basic and 
pervasive characteristics of the original items in an inventory may 














MELANY E. BAEHR 115 


provide a fruitful means of comparing and unifying the many and 
varied first-order temperament factors described by different inves- 
tigators in this field. 


A First-Order Factorial Investigation 
A new factorial experiment was designed using tests, or in this 
instance inventory items, which would cover the concepts embodied 
in the factors described in the second-order analysis of the Guilford- 
Martin data and, in addition, the concepts embodied in the Heymans- 
Wiersma general Emotionality factor in order to investigate the fol- 
lowing specific questions: 


(1) Can the second-order factors be obtained as first-order fac- 
tors in a new analysis when the “tests” or items are selected with 
some knowledge of the temperament traits which determine the re- 
sponses to the items? In other words, can we obtain relatively pure 
tests of the traits and so circumvent the first differentiation accord- 
ing to combinations of traits or “temperament ratios’? 

(2) Can we obtain a general Emotionality factor directly, or 
will emotional responses again be differentiated in terms of adjustive 
or nonadjustive behavior characteristics ? 


The items finally chosen to cover the concepts embodied in the 
factors from the second-order analysis and the Heymans-Wiersma 
Emotionality variable are given below. A number of psychologists 
collaborated in an attempt to exclude items which were ambiguous or 
were synonyms of others on the list. 


List of Behawior Items* 


1. Agreeable 12. Impulsive 
2. Cheerful 18. Lively 
3. Cooperative 14. Persevering 
4. Decisive 15. Prompt Starter 
5. Demonstrative 16. Quick Worker 
6. Emotionally Stable 17. Seeks Company 
7. Energetic 18. Self-confident 
8. Enthusiastic 19. Socially at Ease 
9. Even-tempered 20. Steady Worker 
10. Happy-go-lucky 21. Sympathetic 
11. High-strung 22. Talkative 


*In so far as it was compatible with the concepts to be covered, the items 
represent socially acceptable behavior. A list composed of the antonyms of these 
words was treated separately and used in another study. 











116 PSYCHOMETRIKA 


A modified form of the paired comparison technique was used for 
the assessment of theséitems. The 22 items were combined in all pos- 
sible pairs and presented randomly in a single schedule. 

For each pair, the rater was asked to underline that item which, 
in general, was more descriptive of the behavior of the person he was 
rating. The rater was urged to make a choice of one word in each pair 
whenever possible, but was permitted to mark both words of a pair 
when he considered that they were equally descriptive of the person 
being rated, and to leave both words of a pair unmarked when he was 
convinced that neither was in any way descriptive of the behavior 
of the person being assessed. For each person, the score for each item 
was the number of times it was underlined in the schedule. Completed 
schedules were obtained for a sample of 200 subjects.* 

The product-moment intercorrelation coefficients were calculated 
for the 22 items. The correlation matrix is given in Table 6. The com- 
munalities were stabilized after two successive factorings by the cen- 
troid method. The final orthogonal factor matrix had six columns and 
is given in Table 7. A frequency distribution of the sixth-factor resid- 
uals is given in Table 8. The orthogonal factor matrix was rotated to 
simple structure. The transformation matrix and the resulting oblique 
factor matrix are given in Tables 9 and 10. 

Four of the six rotated factors allow of a clear interpretation. 
The remaining factors are given only tentative interpretations. The 
significant saturations for each of the six factors were as follows: 


Factor A 
Code Number Item Saturation 

12 Impulsive —.56 

5 Demonstrative —.42 
10 Happy-go-lucky —.32 
20 Steady Worker +.47 
14 Persevering +.45 
15 Prompt Starter +.29 


This bipolar factor is well represented in both directions and 
could be designated according to either pole. One end of the bipolarity 
is described by Impulsive, Happy-go-lucky, and Demonstrative, which 
are all out-going behavior responses, combined with variability and 
distractability as opposed to steady perseverance. These are the es- 


*Use of the paired comparison method of obtaining judgments, even though 
modified, may restrict the extent to which the results can be generalized. The 
method was used in order to minimize the more serious distortions which are often 
introduced by the halo effect when rating scale judgments are used. 














od 


MELANY E. BAEHR 117 


sential characteristics of Primary Function and the factor is so des- 
ignated. 


Factor B 
Code Number Item Saturation 
2 Cheerful t.56 
9 Even-tempered +.46 
6 Emotionally Stable +-.42 
i Agreeable +.30 
10 Happy-go-lucky +.28 
11 High-strung —.60 
12 Impulsive —.56 
5 Demonstrative —.41 


One end of this bipolar factor is described by placid, stable, and 
considered behavior responses, combined with a warm feeling tone 
denoted by cheerfulness and agreeableness. This pleasant, placid, and 
stable behavior is similar to that described by Factor A in the second- 
order analysis, and the present factor is therefore similarly desig- 
nated Emotionally Stable. 


Factor C 
Code Number Item Saturation 
17 ‘Seeks Company —.52 
22 Talkative —.32 
4 Decisive +.42 
18 Self-confident +.42 
6 Emotionally Stable +.38 
7 Energetic +.22 


The positive loadings for this factor give a picture of vigorous 
and confident behavior responses which are unhampered by emotional 
complexity. These are the central characteristics of Factor C in the 
second-order analysis (which had saturations on Thurstone’s Vig- 
orous, Active, Confident, and Emotionally Stable factors) and of the 
Heymans-Wiersma Activity variable. The present factor has, jin addi- 
tion, a substantial negative loading on Seeks Company and a smaller 
negative loading on Talkative. Although these behavior characteris- 
tics have not previously been included in a description of the Activ- 
ity factor, they do not seem to be incompatible with the general pat- 
tern of behavior which it represents. It seems possible that the con- 
structively active person will not seek out company and will avoid idle 
talkativeness. In view of these considerations, this factor is géneral- 








118 PSYCHOMETRIKA 


ized as Activity. It must be stressed again that the term “Active” as 
used by Heymans, Wiersma, and Biesheuvel is not synonymous with 
energetic. It refers to a broader concept of which general energy is 
but one aspect. 


Factor D 
Code Number Item Saturation 
7 Energetic +.54 
8 Enthusiastic +.53 
13 Lively +.48 
9 Even-tempered —.37 
6 Emotionally Stable —.32 


The behavior pattern is one of stimulability combined with mood- 
iness and emotional instability. This hypomanic behavior is the re- 
sultant of a combination of some of the elements of Primary Func- 
tion and Emotional Instability. It is considered that Factor D is an 
example of a first-order factor which is a combination of elements from 
different temperament traits. It is designated Hypomania (Primary 
Function and Emotionally Unstable). 

The structure was not very clear for factors E and F,.and we 
have some reservations concerning the interpretation of these fac- 
tors, especially of Factor E. Factor E has positive loadings of .60 on 
3 (Cooperative), .50 on 21 (Sympathetic), and .39 on 1 (Agreeable) 
with the smallest loading a negative one, —.34 on 10 (Happy-go- 
lucky). We may think of this factor as portraying amiability. 

Factor F has positive loadings of .51 on 19 (Socially at Ease), 
.43 on 17 (Seeks Company), and .32 on 18 (Self-confident) with neg- 
ative loadings of —.55 on 16 (Quick Worker), —.32 on 15 (Prompt 
Starter), and —.28 on 12 (Impulsive). This factor describes the so- 
cially comfortable, pleasure-seeking individual who is slow and daw- 
dling as far as physical output and work is concerned. This “social 
butterfly” type of personality development is quite common, but be- 
cause of lack of corroborative experimental evidence this factor can- 
not be designated with assurance. 

It will be seen that no factor was obtained in this analysis which 
was similar to the general Emotionality factor postulated in the Hey- 
_ mans-Wiersma scheme. 

The correlations between these primary factors were calculated 
by the formula developed by Thurstone (7, p. 188) and are given in 











MELANY E. BAEHR 119 


Table 11. These correlations give interesting additional information 
concerning the factors. 

A negative association between Primary Function and Activity 
has been mentioned by both Biesheuvel (1, p. 333f.) and Heymans (6, 
p. 271) and is indicated in Table 11 by a correlation of —.504. Hey- 
mans (6, p. 271) also considered that Primary Function was posi- 
tively associated with Emotionality. We obtained a correlation of 
+.464 between the Primary Function factor and the Emotionally 
Stable factor, which is probably due to the positive affect inherent 
in the latter. The Hypomania factor represents an association be- 
tween elements of Primary Function and Emotional Instability. It 
would seem that Primary Function can be associated either with posi- 
tive or negative affect which is consistent with Heymans’ observa- 
tions. 

As we should expect, the Hypomania (Primary Function and 
Emotionally Unstable) factor has a high positive correlation with 
Primary Function and a high negative with Activity. We should ex- 
pect the Primary Function elements in the Hypomania factor to be 
positively associated with the affect component of the Emotionally 
Stable factor. At the same time we should expect the Emotionally 
Unstable elements in the Hypomania factor to be negatively asso- 
ciated with the Emotionally Stable factor. These conflicting tenden- 
cies are represented by the low correlation of +.186 between these 
factors. 

The correlations between the primary factors are consistent with 
other investigators’ observations concerning the associations between 
these traits. This fact tends to confirm the interpretations made in 
this study. 

A second-order analysis of the primary factors was not attempted 
for the following reasons. There were only six first-order factors of 
which four were interpreted with confidence and two were given only 
tentative interpretations, Under these circumstances it was consid- 
ered that the number of meaningful variables which would enter into 
the second-order analysis would be too small to achieve the over- 
determination of factors which is required for an analysis to be scien- 
tifically convincing. 


Summary 
The theory was advanced that, when factorial studies of tem- 
perament were based on the responses to inventory items which could 
be expected to be the resultant of a combination cf temperament 








120 PSYCHOMETRIKA 


traits, the determination of second-order factors represented a puri- 
fication process. Under these circumstances the second-order factors 
would be more likely to describe the functional unities underlying the 
domain than the factors obtained directly in the first-order. 

It was shown empirically that the nine linearly independent fac- 
tors inherent in three Guilford-Martin personality inventories could 
be described by four clearly interpretable second-order factors. Three 
of these four second-order factors were obtained directly in the first- 
order in a new factorial investigation based on the responses to se- 
lected behavior items. Finally, it was shown that two of the original 
second-order factors which were obtained again directly in the first- 
order analysis were very similar to variables employed in a concep- 
tual scheme of temperament traits described by an independent in- 
vestigator. 


REFERENCES 


1. Biesheuvel, S. The nature of temperament. Transactions of the Royal Society 
of South Africa, 1935, 23, 311-360. 

2. Biesheuvel, S. The diagnosis of temperament. Unpublished guide for testers 
currently used by National Institute for Personnel Research, Johannesburg, 
South Africa. 

3. Guilford, J. P. An inventory of the factors STDCR. Beverly Hills, Califor- 
nia: Sheridan Supply Co., 1940. 

4. Guilford, J. P. and Martin, H. G. The Guilford-Martin inventory of factors 
GAMIN (Abridged edition). Beverly Hills, California: Sheridan Supply Co., 
1943. 

5. Guilford, J. P. and Martin, H. G. The Guilford-Martin personnel inven- 
tory I. Beverly Hills, California: Sheridan Supply Co., 1948. 

6. Heymans, G. Gesammelte kleinere Schriften zur Philosophie und Psychologie. 
Haag: Martinus Nijhoff, 1927. 

7. Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 
1947, 

8. Thurstone, L. L. Psychological implications of factor analysis. Amer. Psy- 
chologist, 1948, 3, 402-408. 

9. Thurstone, L. L. The dimensions of temperament. Psychometrika, 1951, 16, 
11-20. 

10. Wiersma, E. D. The formation of character. Verhandelingen der Koninklijke 
Nederlandsche Akademie van Wetenschappen (Tweede Sectie), Amsterdam, 
1938, Deel XXXVII, No. 4, 1-48. 


Manuscript received 3/12/51 
Revised manuscript received 6/1/51 








MELANY E. BAEHR 











nN 


TABLE 1 
Correlations between Thurstone’s Primary Factors, FR, 
R S E Vv D A I xX, xX 
R 1.00 
Ss —11 1.00 
E —.23 52 1.00 
V 15 —.08 05 - 1.00 
D 07 01 04 03 1.00 
A 11 —37 —.18 32 —17 1.00 
I —01 —15 —10 —.11 71 —.26 1.00 
xX, .06 56 .66 00 03 —16 —.19 1.00 
x —.02 —14 —.12 —09 —.19 04 —22 —.01 1.00 








TABLE 2 


Orthogonal Factor Matrix F’, 








I II III IV 




















R —18 —.08 24 35 

Ss 59 40 —.16 32 

E 65 59 12 —.24 

Vv —.14 .24 .53 14 

D 47 —.50 45 .03 

A —.57 15 306 —.16 

I 46 —.76 30 —.07 

xX, . 45 -65 .28 .26 

X, —.18 06 —16 —.01 

TABLE 3 
Fourth-Factor Residuals 
R Ss E V D A I x, 

R 
Ss —.04 
E —.01 -00 
V —.03 —.02 —.03 
D —01 —01 —.01 —.02 
A —.01 .02 01 .03 .02 
I —.04 —05 —.01 —02 —.03 —.01 
X, 03 —.01 .02 01 01 .06 03 
X, —01 —08 —.02 —.05 -—.01 —.01 —.01 07 











122 


PSYCHOMETRIKA 


TABLE 4 
Transformation Matrix A, 




















A B Cc D 
I -737 427 .082 .266 
II 460 —.582 487 —.241 
III —.098 .685 831 —.098 
IV 485 .099 —.265 -929 
TABLE 5 
Oblique Factor Matrix V, 
A B C D 
R —.02 10 .06 87 
Ss .79 —.12 .00 06 
E 62 .04 AT —.55 
Vv .02 15 52 .06 
D .09 80 14 —.02 
A — 46 —.07 40 —.07 
I —.07 85 —09 —.08 
xX, 73 —.02 49 —.06 
xX, —10 —22 —.11 04 














123 


MELANY E. BAEHR 


2148} 814} Uy PezzJWO UseEq eAvY syUJOd [BUjDEP e4L, 











FO- 8h- SO- 92- 6& ETI- 6I- G7 Gh 86 6T 8t 82- rE LO OF 8E EE- OF ES ZZ- Ze 
90- 80- 6T- 90- FI- G2- 60- OT- ST- FI- 90- GB 80 GE- TO- GO- 98- OF 02 GE TZ 
To- 60 OF TE ES TL LF- 9F- 02- G6H- GZ ZH- 2O- 82 G6H- GE LT O&- TO- 02 

8h 3&2 vbv- G&- Ge TO- Te- o- IE FO 90- SI- 8E 2O- 2O- TO- FT IT 6T 

Gi "OT- LI cE St 8s- ve" G6O- GE ST 10 Sh Te- cy Fi- 90 SO ST 

L&- 0&- LE- 06 FE GO LE LO- ST 80- F2- LZ OF ET- 02 00 LI 

sg ve OF- Gt LE Le T2- GI- Bt at- LI- &@ STI- LE 92- OT 

8h FZ- S0- ZI 68- SI- b2- OL TI- 62- #8 TI- 88 6Z- ST 

9v- LE- LO“ 6GF- OT 9E- 00 02 GH TS 80 EF 80- FI 

ve 026 vE ShH- TS TH TH VE 62- VE ST Ze- &T 
69 406 FS- 06 FT T9- 8G YI- FE- GS- 6E- ZI 

GO- LS- 80 9T 99- 6& &0O- LE- LE- 6E- TT 

SO- 61 FI- SI- Te Th 02- Ld 90 OT 

vs- 88- 6 TH- TT- FF LO GS 6 

ve G& & Z32- STI- 6T TI- 8 

S@- 80- 06 92- 2I- TE L 

68- 93 ve GT 66 9 

ve- 82- TO- 62- G 

ve- 6)- 98- F 

8T 39 & 

Sb @ 

T 
te 02 GE St LE 9T ST PE SE cE TE OF 6 °S8 £ FS $$ F EB S T ‘ON 
2po) 








7 XIIYBIY UOI}BIaIIOD 
«9 GDTEAVL 








PSYCHOMETRIKA 


TABLE 7 
Orthogonal Factor Matrix F, 














I T 66 IV Vv VI 

1 .28 63 —.23 19 14 01- 
2 —.21 68 04 36 —15 —.05 
3 4 43 —.35 16 33 ll 
4 43 —.47 47 —.30 04 —.05 
5 —63 —.09° —19 —.29 03 —.12 
6 62 42 29 —13 —.07 -—.30 
7 —22 —.39 38 29 —.18 .04 

8 —.53 .04 15 32 27 08 + 
9 54 56 —.10 04 —.16 —.10 
10 —.50 33 10 05 —24 —31 
11 —43 —59 —.29 —.16 10 04 
12 —62 —43 —32 —.17 12 —.30 
13 —71 —.07 .24 31 10 —.03 
14 68 —.39 12 —.03  —.04 .25 
15 37 —.60 —.08 19 —.25 -06 
16 25 —59 —.14 24 —08 —.23 
17 —.49 29 —.09» —17 —.27 31 
18 24 16 63 —.39 04 —.09 
19 —.07 40 29 —.39 09 13 
20 15 —.31 .06 13 —.17 19 
~ 21 Al 36 —.38 14 28 07 
22 —.61 03 —.10 07 —.18 .05 














MELANY E. BAEHR 











TABLE 8 
Frequency Distribution of Sixth-Factor Residuals* 
(N = 462) 

Residual Frequency 
—.15 2 
—.14 0 
—.13 0 
—.12 2 
—.11 2 
—.10 0 
—.09 4 
—.08 4 
—.07 20 
—.06 16 
—.05 30 
—.04 38 
—.08 58 
—.02 50 
—.01 50 

.00 52 
01 42 
.02 50 
.03 14 
.04 10 
05 6 
06 - 6 
07 6 








7 


*This is the full table of residuals in which 
each residual is given twice. There are three 
residuals with an absolute value greater than 
-10. A reviewer has pointed out that it may 
have been possible to extract additional factors. 
This is so. However, a good structure was ob- 
tained for the six factors extracted, and it seems 
unlikely that the extraction of additional fac- 
tors would have had a significant effect on the 
interpretation of the factors in this study. 





PSYCHOMETRIKA 


TABLE 9 
Transformation Matrix A, 








A 


B Cc 


D 


E 





I 261 
II —.102 
III 345 
IV 338 
V —.201 
VI 805 


—.542 
—.118 


227 227 
453 —.086 
321 439 
577 —.061 

599 
—.621 


—.237 
—.158 
434 
495 
555 
420 


137 
238 
—.397 
—.002 
-770 
418 





TABLE 


Oblique Factor 


10 


Matrix V, 








A 


B Cc 


D 


E 


F 





—.02 
.00 
02 
AT 

—.42 

—.05 
21 
.03 
01 

—.32 

—.20 

—.56 

—.03 
45 
29 

—.02 
.06 
05 
.00 
AT 

—.09 

—.09 


CoOnNauprp OD 


20 
21 
22 


30, —.02 
56° —.17 
08  .08 
| a 4 
oe ee: | 
42 «88 
—— 
——— 
46 ~—-.00 
23 —.05 
i << Zpe 
mi. wa 
01 01 
00 07 
03 —.09 
ee! 6 
00 —.52 
09 42 
—-03 7 
19  .00 
ate is 
ond... alll 


—.09 
03 
—.01 
.03 
—.09 
—.32 
54 
53 
—.387 
—.13 
06 
—.06 
48 
.02 
—.05 
—.05 
—.07 
—.02 
—.01 
—.05 
01 
05 


39" 
—.01 
60’ 
—.28 
—.06 
ail 
—12 
12 
08 
—.34 
01 
—.09 
ng 35 
03 
—22 
ait’ 
—.04 
38 
.09 
—.04 
50 
—.15 


03 
03 
01 
.03 
05 


—.01 
—.12 


01 
.00 


—.07 
—.10 








TABLE 11 
Correlations between the Primary Factors R, 





Emot. Hypo- 


mania 


Primary 


Function 


A 


Stable 
B 


Activity 


C 





Pr. Function 
Emot. Stable 
Activity 
Hypomania 


1.000 
464 
—.504 
-667 
—.099 
430 


1.000 
—.072 
186 
339 
327 


1.000 
—.503 
070 
107 














